News Stay informed about the latest enterprise technology news and product updates.

Data masking hides information from testers

Start-up DataGuise enters the data masking market fueled by regulatory compliance pressures. One analyst says companies prefer masking over other techniques.

It has become a cliché: Data security is only as strong as its weakest link. The problem with clichés is that they are true.
To get security news and tips delivered to your inbox, click here to sign up for our free newsletter.

The weak link is often the development and testing environment. A company takes every reasonable precaution to button up sensitive databases and then uses actual production data to develop and test new applications. The exposure is exacerbated when the work is outsourced, often to foreign partners.

This once made perfect sense: Develop and test with actual customer records to make sure your apps work when they go into production. Industry and regulatory standards such as PCI and SOX, and best security practice changes all that.

So, companies can either generate test data or mask real data. The challenge is producing data that works without exposing customer information to the world. Most organizations that take this risk seriously are doing one or the other in house, and, of those, most prefer data masking, because it is easier to scale across multiple applications.

"Ninety percent of organizations prefer to mask data," said Noel Yuhanna, a principal analyst at Forrester Research Inc. "Take a copy and mask only the sensitive data, which may be three or four columns at most and be done with it. Data creation is more complex and doesn't always represent actual business scenarios."

It's largely a question of scale, according to Yuhanna. If you need to generate complex test data, with multiple fields and different combinations of fields, chances are you'll only be able to apply it to an application or two. Data generation scales to multiple applications if the test data requirement is simple, say, just a name and address, but that's not typically the case. Consider, he said, the complexity for an insurance company, which has a customer who has been married three times, divorced twice and has five children.

Data masking, on the other hand, simply substitutes false values for real ones, keeping the data formats, regardless of number and types of fields.

Even so, performing data masking in-house grows more burdensome, given environments with hundreds of new or Web-enabled applications and the increasing compliance and security requirements. As with other security requirements, from log management to database monitoring to configuration management, technologies to automate the process start to make good business sense.

A number of vendors offer such products. Typically, Yuhanna said, these are archiving companies who added data masking as a natural extension of their core technology. Extract, transform and load (ETL) companies such as Informatica Corp., which processes incoming data into required formats, are also in this market.

Other vendors include IBM, through its Princeton Softech acquisition, Applimation, Direct Computer Resources Inc.'s DataVantage, Compuware Corp., and Camouflage.

Oracle also offers data masking for its databases through an add-on product.

Start-up DataGuise Inc., which announced its initial products recently, enters this market as a pureplay, counting on growing interest as more organizations look for better ways to protect their data in testing and QA.

DataGuise offers two products, dgDiscover, which discovers sensitive data across databases, and dgMasker, which also works directly within the database to automatically mask data for development and testing.

"Home-grown solutions often get stopped because of turnover in IT and documentation is not there," said Erik Jarlstrom, DataGuise vice president of customer advocacy. What's more, applications in complex environments talk to each other, but are typically developed and tested by separate groups, each using their own masking. "A lot of companies want to simplify; they don't want to do scripting and home-grown solution, but have one solution across the enterprise."

DataGuise plans to release a third product, DataGuard, in Q1 2009, which will obfuscate data from business users, so they only see the data they need to do their jobs.

Forrester predicts that 35% of U.S. enterprises will be implementing data masking by 2010. Yuhanna believes most of them will keep it in-house, but will use consulting services rather than do it themselves.

The data masking product market is very small, he said, perhaps $20 million annually, with a potential to grow to $100 million in four years. But service providers such as Deloitte and Accenture can be part of a $500 million market by then, Yuhanna said.

In any case, data masking is not a plug-and-play, the technology is pretty straightforward, which is why many enterprises at least attempt to do it on their own.

"This is not rocket science," he said. "It's more about the process, the policies you implement and the procedures you follow. That's what makes masking more complex."

Dig Deeper on Data privacy issues and compliance