Problem solve Get help with specific problems with your technologies, process and projects.

How to prevent input validation attacks

Learn what canonicalization is and what Web developers can do to prevent input validation attacks.

How does canonicalization work? What should I do to prevent this input validation attack?

Canonicalization is a fancy word for a fairly simple concept. It stems from the fact that there are more than 47 gazillion ways to encode characters for the Web today. Some of the most popular are UTF-8, UTF-16, and so on (which are described in detail in RFC 2279) A single character, such as a dot (.), may be represented in many different ways, such as ASCII 2E, Unicode C0 AE and many others. The problem is, with all of these different ways of encoding user input, a Web application's filters can be easily confused if they're not carefully built. For example, if you wanted to filter dots, but only remove ASCII 2E, it is possible for someone to use an alternative Unicode format with C0 AE and squeeze a dot past your filter. Web applications often filter user input for evil characters, like quotes that might be part of SQL injection attacks or script tags (like < and >) that might be part of a cross-site scripting attack. However, if the Web application filter only searches for UTF-8 input an attacker could use another encode, like UTF-16, to code the evil characters and bypass the filter.

Canonicalization, means converting something into a simpler, more fundamental form. Web sites should have code that converts user input from a variety of different encoding forms to a single simple form that everything after will utilize, like UTF-8. The filters and all subsequent processes are applied after canonicalization, so everything has the same impression of what the user input will mean. This conversion process is called canonicalization. As a user, there is nothing you can do about canonicalization issues on the Web sites you use. But, as a Web developer, you'll want to make sure that you appropriately canonicalize the data you receive from users. There's a wonderful Web application developer's guide at the Open Web Application Security Project (OWASP) that describes the details of the code you'd need to write to canonicalize data.

More on this topic

  • Visit our Web Application Attacks Learning Guide for Web application security tools and tactics to protect against these specific attack types.
  • Learn ten dos and dont's for secure coding.
This was last published in August 2006

Dig Deeper on Application attacks (buffer overflows, cross-site scripting)