I have an application where I want to use a reliable message digest algorithm, such as SHA-1 or MD5. Both of these
are implemented in Java 1.4, and I have some sample code and results from the IBM DevelopersWorks site. However, when I compile and run the code on a Sun box, the message digest doesn't match the expected results. It appears to be a code-page issue.
Can these Java message digest algorithm implementations be used in such a manner as to generate the same results across platforms and control for code-page differences?
The problem isn't the hash algorithms, it is what we call "text canonicalization." What this means is that you have to account for code-page differences before hashing by translating into some known "canonical form" -- or remember *not* to do any translation before hashing. Either of them is an acceptable way to solve the problem. You have to do the hash over the actual data.
OpenPGP (for which I'm a spec author) specifies that all text is in UTF-8 of Unicode.
For more information on this topic, visit these other SearchSecurity.com resources:
Ask the Expert: Clarification of encryption keys
Ask the Expert: Using MD5 in Java
WhatIs Definition: canonicalization
Dig deeper on Disk Encryption and File Encryption
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.