When processing an order, we have to search for a match of the submitted credit card number against a list of hotcards...
(stolen or blocked cards), provided by third parties. According to the PCI (Payment Card Industry) Data Security Standard, even if a card is on the list, we cannot leave its numbers in plain text. We are considering transforming these numbers into their hash values using SHA-1 (discarding the list in clear text) and then performing the comparison when we process a payment. We like SHA-1 because it does not require any key management function. However, we are worried about hash collisions or false positives that would erroneously identify a good card with a hotcard. Is using SHA-1 like this a good approach?
Using SHA-1 to create hashes of credit card numbers to avoid storing them in cleartext is fine. Additionally, because the chances of a hash collision are so low, it's unlikely you'll get a false positive. Let's look at SHA-1 and why it is safe to use in this scenario.
The SHA (Secure Hash Algorithm) family is a set of related cryptographic hash functions designed by the algorithm creates a hash value from any kind of data, such as a file, password, or in this case, a credit card number. This value is virtually unique to the input data, so even a small change in the data will result in a completely different hash due to the avalanche effect. Also, there is no practical way to calculate a particular data input that will result in a desired hash value and it is impossible to use the hash value to recover the original data. The most commonly used function in the family is SHA-1 and it is employed in a large variety of popular security applications and protocols, including SSL, PGP, S/MIME and IPsec.
Your concern about hash collisions is likely a result of an attack announced in August 2005, that required fewer than 2^63 (9,223,372,036,854,780,000) hash computations to find collisions in the full version of SHA-1. A collision means two pieces of data have the same hash value. This attack requires less computational complexity than a brute-force search for a collision, which would require 2^80 computations, and is therefore considered a break, according to academic cryptography. Although some observers are concerned that finding a collision for SHA-1 is within reach of massive distributed Internet search, it doesn't necessarily mean the attack is practically exploitable. Regardless, it is interesting to note that in September 2005, Microsoft announced it was banning the use of DES, MD4, MD5 and, in some cases, SHA-1 encryption algorithms in any functions.
So, why is it still safe to use SHA-1 to encrypt your credit card numbers? Because the chances of two credit card numbers having the same hash value are so small, it's unlikely you'll find a situation where the hash of a good card number matches the hash of a bad card number, thus ruling out the possibility of a false positive. Also, the attack is a collision attack, not a pre-image attack. As I previously mentioned, a collision attack finds two pieces of data with the same hash, but the attacker can't pick what the hash will be and therefore cannot break the tools that use SHA-1 to check for changes in a hashed data. On the other hand, a pre-image attack enables someone to find a bad credit card number that causes a hash function to produce a hash value of a valid card number. However, because you are using a blacklist, the attacker can't take advantage of this, because the comparison process would find the bad card number on the blacklist.
If you are still concerned, you could consider using SHA-224, SHA-256, SHA-384 or SHA-512, sometimes collectively referred to as SHA-2. This would require extra storage space, however, because SHA-1 creates a hash value size of 160 bits and SHA-224, for example, creates a value of 224 bits. Also, the comparison process would be a little slower.
For More Information:
Related Q&A from Michael Cobb
Expert Michael Cobb explains how an HTTP referer header affects user privacy and outlines changes that can be made to ensure sensitive data is not ...continue reading
Expert Michael Cobb explains the difference between the REESSE3+ and IDEA block ciphers and explores when each is applicable in an enterprise setting.continue reading
While cookies are critical to delivering personalized Web content, they are a privacy concern. Learn how adding Bloom filters to cookies can help ...continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.