How does canonicalization work? What should I do to prevent this input validation attack?
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Canonicalization is a fancy word for a fairly simple concept. It stems from the fact that there are more than 47 gazillion ways to encode characters for the Web today. Some of the most popular are UTF-8, UTF-16, and so on (which are described in detail in RFC 2279) A single character, such as a dot (.), may be represented in many different ways, such as ASCII 2E, Unicode C0 AE and many others. The problem is, with all of these different ways of encoding user input, a Web application's filters can be easily confused if they're not carefully built. For example, if you wanted to filter dots, but only remove ASCII 2E, it is possible for someone to use an alternative Unicode format with C0 AE and squeeze a dot past your filter. Web applications often filter user input for evil characters, like quotes that might be part of SQL injection attacks or script tags (like < and >) that might be part of a cross-site scripting attack. However, if the Web application filter only searches for UTF-8 input an attacker could use another encode, like UTF-16, to code the evil characters and bypass the filter.
Canonicalization, means converting something into a simpler, more fundamental form. Web sites should have code that converts user input from a variety of different encoding forms to a single simple form that everything after will utilize, like UTF-8. The filters and all subsequent processes are applied after canonicalization, so everything has the same impression of what the user input will mean. This conversion process is called canonicalization. As a user, there is nothing you can do about canonicalization issues on the Web sites you use. But, as a Web developer, you'll want to make sure that you appropriately canonicalize the data you receive from users. There's a wonderful Web application developer's guide at the Open Web Application Security Project (OWASP) that describes the details of the code you'd need to write to canonicalize data.
Dig Deeper on Application Attacks (Buffer Overflows, Cross-Site Scripting)
Related Q&A from Ed Skoudis
At Black Hat 2006, researcher Joanna Rutkowska unveiled a piece of machine-based malware called the Blue Pill. But is it a serious threat to your ...continue reading
Wi-Fi on airplanes seems like it will be unavoidable in the future, but what security risks does it pose? In this security threats expert response, ...continue reading
There are some rare forms of malware that antivirus software doesn't pick up on, but there are some good tools to remove all sorts of malware.continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.