This question is extremely important, as improper input validation leads to numerous kinds of attacks, including cross-site scripting, SQL injection, command injection, buffer overflows and many others.
Most applications only need to accept a limited set of characters as input, typically plain old alpha and numeric. If an app needs to get a specific name and age, for example, one alpha field and one numeric field will suffice. Without proper filtering, however, attackers might be able to use other characters, including semicolons, greater-than or less-than symbols, and quotation marks to exploit the application. If a software developer does not properly screen all forms of user input to remove unusual characters, the software may be exploitable in countless different ways. Software's input-validation function removes these characters, acting as a shield.
Unfortunately, some software developers either leave out input-validation code altogether, or they implement weak input validation that might not filter out a truly comprehensive set of characters. Also, there are dozens of ways to alter or encode data to dodge validation filters: UTF-8, Hex, Unicode, mixed case and many more. There's a great article by RSnake that shows some different encoding tricks designed to slip cross-site scripting attacks past weak input-validation code.
With that overview in mind, let's turn to your question: how can you prevent input-validation attacks? This is really an issue for software developers, and there isn't much that users can do to prevent these attacks. For software developers that want to make sure that their input-validation code is up to snuff, use the following as a checklist:
- Specify variable types: Use the type enforcement capabilities of your development environment to limit the kind of data that can be entered into some fields. In particular, if you only need to accept an integer in a field, define that variable as an integer (if your development environment and language allow you do that) so that your software will reject any entered strings.
- Don't define all possible badness; instead only accept goodness: Trying to create a comprehensive list of all bad characters for all the different kinds of attacks is next to impossible. Thus, when creating validation code, define what characters are acceptable, such as A-Z, a-z, and 0-9, and filter out everything else. Such a "deny-all-except-for-certain-allowable-characters" approach is a far stronger way to write filtering code.
- Limit size of input: If you ask for someone's age, keeping it to a three-digit field will cover every reasonable case, even the centenarians in your user population. If you ask for someone's name, a hundred or so characters are reasonable. That way, even if you aren't properly filtering for characters appropriately, you are still limiting the real estate that the attacker has to pull off an attack.
- Canonicalize before filtering: If user input is encoded in a fashion that the filters aren't designed to handle, you very well might get hacked. Thus, whenever you receive user input, convert it to a standard encoding scheme, such as plain ASCII, before applying your filters. This process is known as canonicalization, and it converts character streams of different encoding patterns to a single format that is compatible with your filtering code.
- Filter all input: Filter every form of input to your application, including data that comes in via the network, the GUI, files read from the file system and so on. Don't assume that one of your fields or input vectors isn't important. Attackers will hunt for weaknesses and exploit them, so cover all of your bases.
- Filter on the server side: In many applications, attackers might be able to control clients, such as browsers or thick-client GUIs, tweaking their functionality to bypass filtering done at the client. Thus, filter on the server side to protect all back-end functionality.
- Don't worry about multiple layers of filtering: Sometimes, a single application will include multiple modules, each of which filters input that flows through that mod. This architecture could result in a single set of input getting filtered multiple times as it snakes its way through the application. While that might be a performance concern, it's actually a good thing from a security perspective. The layers of filtering act like belts and suspenders to keep the overall application secure.
- Use tried-and-true filters if available: Rather than rolling your own code for user input validation, use filters that have been carefully developed and rigorously tested by others, if you have access to such code. Some software development firms and large enterprises doing in-house development have defined reusable input validation code for all software they create. Find out if your organization has such code, learn how it works, and then use it. If you don't have such code in-house, you can adapt user input-validation code snippets from a variety of free sources. One of my favorites is the user input-validation code for PHP called Inspekt, funded by the OWASP project. Even if you don't use PHP, the concepts in the Inspekt project can be leveraged in other languages.
- Get a penetration test: To make sure your code is secure, subject it to a controlled penetration test to see if flaws can be identified.
Dig deeper on Software Development Methodology
Related Q&A from Ed Skoudis, Contributor
At Black Hat 2006, researcher Joanna Rutkowska unveiled a piece of machine-based malware called the Blue Pill. But is it a serious threat to your ...continue reading
Wi-Fi on airplanes seems like it will be unavoidable in the future, but what security risks does it pose? In this security threats expert response, ...continue reading
There are some rare forms of malware that antivirus software doesn't pick up on, but there are some good tools to remove all sorts of malware.continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.