Ask the Expert

How to prevent input validation attacks

How does canonicalization work? What should I do to prevent this input validation attack?

    Requires Free Membership to View

Canonicalization is a fancy word for a fairly simple concept. It stems from the fact that there are more than 47 gazillion ways to encode characters for the Web today. Some of the most popular are UTF-8, UTF-16, and so on (which are described in detail in RFC 2279) A single character, such as a dot (.), may be represented in many different ways, such as ASCII 2E, Unicode C0 AE and many others. The problem is, with all of these different ways of encoding user input, a Web application's filters can be easily confused if they're not carefully built. For example, if you wanted to filter dots, but only remove ASCII 2E, it is possible for someone to use an alternative Unicode format with C0 AE and squeeze a dot past your filter. Web applications often filter user input for evil characters, like quotes that might be part of SQL injection attacks or script tags (like < and >) that might be part of a cross-site scripting attack. However, if the Web application filter only searches for UTF-8 input an attacker could use another encode, like UTF-16, to code the evil characters and bypass the filter.

Canonicalization, means converting something into a simpler, more fundamental form. Web sites should have code that converts user input from a variety of different encoding forms to a single simple form that everything after will utilize, like UTF-8. The filters and all subsequent processes are applied after canonicalization, so everything has the same impression of what the user input will mean. This conversion process is called canonicalization. As a user, there is nothing you can do about canonicalization issues on the Web sites you use. But, as a Web developer, you'll want to make sure that you appropriately canonicalize the data you receive from users. There's a wonderful Web application developer's guide at the Open Web Application Security Project (OWASP) that describes the details of the code you'd need to write to canonicalize data.

More on preventing Web application attacks

  • Visit our Web Application Attacks Learning Guide for Web application security tools and tactics to protect against these specific attack types.
  • Learn ten dos and dont's for secure coding.
  • This was first published in August 2006

    There are Comments. Add yours.

    TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

    REGISTER or login:

    Forgot Password?
    By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
    Sort by: OldestNewest

    Forgot Password?

    No problem! Submit your e-mail address below. We'll send you an email containing your password.

    Your password has been sent to: