Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Web 2.0 application development techniques introduce new information security risks

Ajax, Java and other dynamic application coding methods have pulled computing power over to the client, introducing new risks and resurrecting old ones.

Ajax, Java and other dynamic application coding methods have pulled computing power over to the client, introducing risks and resurrecting old ones.

The world of Web application development has recently been re-energized through the advent of Ajax technologies. Through these techniques, Web sites can appear to be more dynamic and interactive, giving the user the experience of a desktop application but with the ease of deployment and maintenance that comes with a server-based application. Enterprises have embraced this combination as a way to minimize deployment costs while maximizing user satisfaction.

Yet these technologies and techniques carry a security risk. The difference between traditional Web applications and newer Web 2.0 Ajax applications is the amount of logic, data and processing that occurs on the client side. No longer are users merely dealing with HTML and Flash, but with complex programmatic logic in the form of JavaScript and sometimes massive amounts of structured data in the form of XML or JavaScript Object Notation (JSON). This carries a twofold problem: the exposure of business logic to the end user; and the possible of threat vectors against the application. Enterprises need to understand the nature of these applications, and whether the threats associated with Ajax are a breed or simply another way to look at the same old problems of distributed Web application security.

Ajax is a catchall description for a collection of development practices and technologies that have existed for some time. At its heart, it is the ability of a browser to send asynchronous requests to a server, and to respond to them through custom JavaScript code instead of simply rendering the response to the screen. There are other ancillary technologies included in these abstract terms: visual effects, interactive components such as rich-text editors, data serialization and more. However, these are largely subordinate to the primary goal: allow a browser to act like an asynchronous client and bypass the browser's default behavior for rendering responses.

Ajax apps are not Web services, RESTful or otherwise, that are designed for consumption by other software. Web-based apps are functionally designed to be experienced via a browser. Securing browser-based applications requires securing the user--who wants to protect his private data--and the application, which wants to secure user data and its code. In the context of an Ajax app, it's essential to prevent private data from leaking from the client to the server, and that an application user's data does not leak from the server to a client. Also, sensitive information from the server should not be passed to any clients.

In order to identify threats, it's important that development and security teams understand threats to regular old Web 1.0 applications like cross-site scripting (XSS) and SQL injection attacks (see "Web App Attacks," below). Then organizations can look at what changes upon adding the ability to use JavaScript code in the browser to send and receive asynchronous messages (often containing structured data). It becomes clear that nothing has fundamentally changed; there aren't any kinds of attacks, just ways to perpetrate them.

Web App Attacks
Before making sense of Ajax threats, security teams need to understand traditional Web application attacks.

SQL Injection Attacks
A user passes to a server input that is inserted into a SQL query as a raw string, thus allowing the user to directly affect the database.

Cross-Site Scripting (XSS)
A user passes input containing JavaScript, which is rendered as output in another user's browser, and then executed.

Session Hijacking
Relying on sequential, non-random or otherwise guessable tokens for establishing important session characteristics, such that a user can experiment with the query string to easily access another user's data (e.g., sending URLs like http://widgets.com?order_num=4 and http://widgets.com?order_num=5), and/or having non-expiring session tokens that can be copied and used by an impersonator.

Buffer Overruns
A user sends malformed text to server code, which either stores it or manipulates it in such a way that the data overruns the allocated to hold the value, thus causing unintended execution of non-application code.

Data Leakage
A Web page is constructed of HTML mixed with the raw results of a SQL query, and a user, by virtue of one of the above attacks or sheer accident, causes alternative data to be rendered into the page, thus exposing either private data or useful structural information about the application, which could lead to further attacks.

In Writing Secure Code, authors Michael Howard and David LeBlanc identify two principles of secure applications: "All input is evil until proven otherwise," and "Data must be validated as it crosses the boundary between untrusted and trusted environments."

In the context of a Web application, this means any data passed from a client's browser to a server must be validated before use in any context. There are three general uses for user-supplied data: it's stored in a database through a SQL query; it's used as a value in a calculation; or it's rendered as output back to the user.

Users, meanwhile, can submit data to a server in two ways: URI-encoded values--either in the querystring or as the entity body of an HTTP POST--or HTTP headers, either through a cookie or other HTTP header.

Nothing about Ajax changes these in any way. Ajax applications do not use user-supplied data in fundamentally ways; they may render the output back as an HTML snippet instead of a full-formed HTML page, and they may perform calculations on the data on the client instead of on the server, but they are still either storing the data, performing a calculation using it, or rendering it.

Likewise, the XMLHttpRequest object that is the backbone of asynchronous processing is still just a mechanism for sending HTTP requests and receiving HTTP responses, which means data is passed to the server in the same ways it always has. "There is nothing in the field of security; input validation is still the only major concern," says Billy Hoffman, lead researcher at SPI Dynamics.

This doesn't mean developers have nothing to worry about. The need for good server-side input validation is unchanged; however, Ajax applications place a greater burden on developers to provide better client-side input validation. Beyond that, the ways in which the classic attacks are perpetrated, and the methods for discovering those attacks, have changed as the prevalence of complex JavaScript in the client has increased. Some attacks include:

XSS, but hidden. XSS attacks have been around for a while. In a standard Web application, every GET or POST to the server results in a response that is rendered in its entirety by the browser. Nefarious JavaScript is rendered either in a < script > block or as the value of an event handler directly in the HTML code, and could be seen by using the browser's "view source" capabilities. The JavaScript was often obfuscated by encoding it through various schemes, but could be found by an attentive person looking for traces of it.

With Ajax applications, content is often injected into a page in such a way that "view source" is meaningless. When an application uses the XMLHttpRequest object to send an asynchronous request to the server, the results are often added into the content through the use of the innerHTML property of DOM elements. For example:

If the response from the server contains executable JavaScript, that code will execute within the context of the loaded page, but will be invisible to the user through "view source" since it only displays the original page as retrieved by the browser, not the current state of the DOM. This isn't , but it is more difficult to trace, since the offending code is invisible unless using a DOM inspector tool.

JSON, XML and serialized data. While there are only two ways a browser submits data to a server--still true with Ajax applications--there are important changes to representations of the data (see "Breaking Java," below). Historically, data transmitted through these vectors were name/value pairs, where the name and the value were represented as simple scalars, or arrays of scalars. For example:

Server-side validation code generally tackles known values in the URI-encoded data, scrubbing it for specific purposes. These values are probably destined for the database, so the code primarily ensures they don't contain executable SQL statements and/or passes them into SQL queries, using parameters instead of constructing the query using string concatenation. Perhaps the code will use the "age" variable to calculate some value, and therefore ensures that it is a numeric type as well. Finally, the resulting HTML response might use the value of "name" as a piece of visual data for the user, and will ensure it doesn't contain executable JavaScript.

There are Ajax applications, though, that don't represent the data this way. Instead of posting a collection of URI-encoded data, they might post a single name/value pair, using some form of serialized data representation. Here's the same POST, but with JSON serialization:

Breaking Java
JavaScript Hijacking exploits a loophole in JSON.

Users of JavaScript Object Notation (JSON), take note: According to recent research from Fortify, there's a clever twist on cross-site scripting called JavaScript Hijacking.

Essentially, the most important protection for a user against nefarious JavaScript is the Same-Origin Policy. Browsers enforce the rule that any script appearing in a < script > block can only make requests back to the same server and port number from which the containing page originated. The only exception to this rule is that any script tag that uses the "src=" attribute may download script from anywhere and execute it in context; therefore, if you serve JSON data as the result of a simple GET operation, another site can simply embed your JSON in its own page and utilize the data it finds, sending it to its own servers for processing.

There are two primary defenses against this technique: only serve JSON data in response to POST requests (< script src= is always a GET method), and/or wrap your JSON in a custom prefix that prevents the JSON from being executable by anything but the valid target page, which knows how to strip out the wrapper before parsing the data.

--Justin Gehtland

Now, the server-side code has a different kind of validation problem, one that involves two steps: de-serialization of the data, followed by property-by-property validation of individual values. JSON isn't the only option for this kind of data, either. XML gets a fair amount of usage for data serialization, and now YAML (YAML Ain't Markup Language) as well (though mostly in Rails-based sites). There is also the rising tide of microformats, or custom data syntaxes that maximize efficiency of the representation at the expense of creating custom parsers for each syntax. In any of these cases, server-side validation now has two problems instead of one, and the complexity is a little greater as a result.

Periodic execution and the hidden post. It used to be that nefarious JavaScript or rewritten URLs or other client-side attacks that involve making requests to the server would be visible to the user because the browser would display the URL in the address bar when the request was made. This made it at least possible for a user to discover the problem by viewing the URLs being used for navigation. Ajax applications make this more difficult by sending requests via a hidden back channel (XMLHttpRequest). Without the use of network sniffers or in-browser components that can recognize XHR traffic, it is impossible to see the requests being sent on behalf of the user.

In traditional Web applications, these requests were visible and triggered by one of two user actions: clicking a link or submitting a form. In Ajax applications, the requests are hidden, and can be triggered by a variety of user actions such as sliding the mouse over an element or tabbing into or out of a field. Many sites take that a step further and use period executors to send requests with no user triggers at all. Using JavaScript's timeout capabilities, some applications establish a loop that triggers a request every x-seconds, sometimes sending only application-provided data, but sometimes user-provided data as well. The request is not only hidden, but might happen while the user is away from the computer entirely.

Do Ajax applications even require security solutions? The problem still comes down to validating user-provided input, and ensuring that whatever is sent to the user conforms to some valid representation format. The user-provided input might take on slightly different shapes in an Ajax world (such as serialized JSON), and responses back to the user might not be fully formed HTML (they could be serialized XML or JSON). These changes imply complexity, but not an overarching change in security; solutions remain relatively the same.

Scrubbing user input, validating system output. Regardless of the mechanisms by which data reaches a server, there are only three things an application will do with it: store, modify or render it. Server-side validation should be specific to how the data will be used: SQL scrubbing, type enforcement and HTML encoding, respectively. Sometimes a piece of data will be used in multiple ways during a single request, and it follows that each appropriate type of validation should be performed.

We can categorize potential system outputs four ways:

    1. Rendering as HTML. HTML representation is constructed either through string manipulation or a templating engine, and rendered to the browser.

    1. Executing as SQL. If we consider the application to be the code running in a process controlled by the Web server, SQL queries are just another output from the application to be consumed by the database server. SQL queries should be constructed using high-level tools that protect against SQL injection through parameterized queries.

    1. Rendering to the browser in non-HTML format. Generally, exporting a textual representation of structured data as JSON, XML, YAML, a microformat or other, as well as any of the many MIME types.

  1. Saving to server file system. Data appended to a file-based data store or other file-based format.

Again, each type of output has specific validations that should be performed. Outbound HTML should be scrubbed for < script > blocks if you don't want JavaScript to execute because of the call. SQL statements should be constructed using known safe techniques for preventing injection attacks. Custom output formats require custom validation, as does custom file-based storage.

Since there are vectors by which user data can arrive at the server, and several ways for data to reach the client, developers need to expand the scope of known threat mitigation techniques. A comprehensive will include complete two-way data validation, client-side and server-side enforcement, and a rigorous testing harness. It should go without saying (but unfortunately still doesn't) that automated testing is the most fundamental part of any security infrastructure.

Openness is the Key
Choose your Ajax development framework carefully.

Ajax development should be done in a standard framework. The major frameworks, like Prototype, jQuery, MochiKit and others, have lots of good code for dealing with common problems. More importantly, they have lots of eyes looking at them checking for bugs. Openness benefits security; organizations should choose a framework with an active user base.

However, Ajax frameworks are immature and haven't tackled many major security issues. Testing vendor Fortify reported in April that only the DWR framework had any solution for--or even mentioned--JavaScript Hijacking. Since then, other frameworks, such as Dojo and Prototype, have caught up.

Worse, most of the frameworks--either commercial or open source--poorly document known or suspected vulnerabilities, as well as workarounds.

--Justin Gehtland

String parsers, HTML and JavaScript. When generating HTML, your framework (see "Openness is the Key," above) should be explicit about the uses for each piece of data. If the value is for display as raw data only, the string should be HTML-encoded. This means that any special characters from the HTML specification will be rendered in an escaped format, preventing the data from interacting with the HTML parser during render. For example, the string "< h1 >Some HTML< /h1 >" would be rendered as "& lt;h1& gt;Some HTML& lt;/h1& gt;". When rendered, the user sees the original string, with HTML we decode the string instead of parsing it into DOM elements.

If formatting is important, but you want to prevent JavaScript from being executed, you will need to remove < script > blocks from the text. Most development platforms will provide utilities, either in a separate utility class or as part of the string class. This might still leave loopholes; for example, the following nefarious HTML is only partially cleansed by removing < script > tags:

In this case, executable JavaScript is embedded as an attribute of an otherwise allowable HTML construct, not as the body of a script tag. The only reliable way to ensure that such JavaScript isn't executed is to HTML-encode the remainder of the string after stripping the < script > block. If you must allow users to upload formatted data to your Web application, take advantage of one of the many markup syntaxes that exist (such as Textile, Markdown or others). These syntaxes provide a custom markup language that can be translated into fully formatted HTML by a server-side library prior to rendering, but that do not give the user access to items like < script > blocks and event handlers. Finally, when using a client-side Ajax framework, make sure you understand how to turn automatic JavaScript parsing on or off. Your default strategy should be to disallow it, though some development frameworks, such as Ruby on Rails, will expect a more liberal policy on JavaScript execution due to the nature of its built-in Ajax support.

Standardized and customized data (de)serializers. If you are using structured data, make sure you have an appropriate parser for the syntax (standard parsers are available for all popular syntax). If using XML, create or use an existing schema that can be used to validate document structure and content. Many microformats have such schemas, and you should enable your XML library's validation for any inbound or outbound data.

Modern templating engines. Most modern Web development platforms provide a templating engine that can automatically HTML-encode any dynamic data being interpolated into the template. Some examples:

Some even have a way to globally escape all rendered values unless specifically asked otherwise. Your code should take advantage of these tools as much as possible, only allowing unescaped HTML to be rendered if it was generated by your application directly, or via parsing a markup language.

Testing, monitoring and reporting. Finally, take advantage of the testing framework provided by your development platform. Perform unit testing on each security layer, ensuring that data validation and representation validation work with known examples of potential threats as test input. Perform functional testing and use some kind of user-spoofing testing technique to ensure the chain from browser to server and back. Make sure to run those tests often, locally and in a continuous integration environment.

With the advent of modern JavaScript unit testing frameworks, your client-side logic should be as thoroughly tested as your server-side code.

Ajax requires a thorough application of techniques already proven to traditional Web apps. Server-side validation needs to be applied to data arriving from the client, and sometimes, that validation needs to include the use of a standard parser. Outbound representations need to be verified against their intended use.

Organizations should take a comprehensive, holistic approach to application security by using validation methods on both sides of the untrusted boundary. This comprehensive approach needs to include thorough testing of the server and client-side logic. You need to understand how to debug Ajax apps using tools like FireBug, the IE Developer Toolbar and others.

As long as you adhere to primary rules of Web security--"All input is evil until proven otherwise," and "Data must be validated as it crosses the boundary between untrusted and trusted environments"--then Ajax shouldn't impact the security of the application.

Dig Deeper on Web application and API security best practices