How does canonicalization work? What should I do to prevent this input validation attack?
Canonicalization is a fancy word for a fairly simple concept. It stems from the fact that there are more than 47 gazillion ways to encode characters for the Web today. Some of the most popular are UTF-8, UTF-16, and so on (which are described in detail in RFC 2279) A single character, such as a dot (.), may be represented in many different ways, such as ASCII 2E, Unicode C0 AE and many others. The problem is, with all of these different ways of encoding user input, a Web application's filters can be easily confused if they're not carefully built. For example, if you wanted to filter dots, but only remove ASCII 2E, it is possible for someone to use an alternative Unicode format with C0 AE and squeeze a dot past your filter. Web applications often filter user input for evil characters, like quotes that might be part of SQL injection attacks or script tags (like < and >) that might be part of a cross-site scripting attack. However, if the Web application filter only searches for UTF-8 input an attacker could use another encode, like UTF-16, to code the evil characters and bypass the filter.
Canonicalization, means converting something into a simpler, more fundamental form. Web sites should have code that converts user input from a variety of different encoding forms to a single simple form that everything after will utilize, like UTF-8. The filters and all subsequent processes are applied after canonicalization, so everything has the same impression of what the user input will mean. This conversion process is called canonicalization. As a user, there is nothing you can do about canonicalization issues on the Web sites you use. But, as a Web developer, you'll want to make sure that you appropriately canonicalize the data you receive from users. There's a wonderful Web application developer's guide at the Open Web Application Security Project (OWASP) that describes the details of the code you'd need to write to canonicalize data.
Dig Deeper on Application attacks (buffer overflows, cross-site scripting)
Related Q&A from Ed Skoudis
Learn how social networking sites compound the insider threat risk, and explore how to mitigate the threat with policy, training and technology. Continue Reading
By viewing a page's HTML source code and writing malicious scripts to a drop-down list, hackers may be able to re-post the malicous page to the ... Continue Reading
Password cracking may be a hacker's specialty, but there are also many strategies to keep passwords secure. Continue Reading