I have an application where I want to use a reliable message digest algorithm, such as SHA-1 or MD5. Both of these...
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
are implemented in Java 1.4, and I have some sample code and results from the IBM DevelopersWorks site. However, when I compile and run the code on a Sun box, the message digest doesn't match the expected results. It appears to be a code-page issue.
Can these Java message digest algorithm implementations be used in such a manner as to generate the same results across platforms and control for code-page differences?
The problem isn't the hash algorithms, it is what we call "text canonicalization." What this means is that you have to account for code-page differences before hashing by translating into some known "canonical form" -- or remember *not* to do any translation before hashing. Either of them is an acceptable way to solve the problem. You have to do the hash over the actual data.
OpenPGP (for which I'm a spec author) specifies that all text is in UTF-8 of Unicode.
For more information on this topic, visit these other SearchSecurity.com resources:
Ask the Expert: Clarification of encryption keys
Ask the Expert: Using MD5 in Java
WhatIs Definition: canonicalization
Dig Deeper on Disk and file encryption tools
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.