Home > Ask the Security Experts > Application Security Questions & Answers > Can data anonymization ensure the privacy of Web application user data?
Ask The Security Expert: Questions & Answers
EMAIL THIS

Can data anonymization ensure the privacy of Web application user data?

Michael Cobb EXPERT RESPONSE FROM: Michael Cobb

Pose a Question
Other Security Categories
Meet all Security Experts
Become an Expert for this site


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


>
QUESTION POSED ON: 19 September 2007
What is data anonymization and is it a concept that enterprises should employ to ensure the security and privacy of Web application user data?

>
EXPERT RESPONSE
There are many laws and regulations requiring an organization to protect personally identifiable information (PII) that it may collect. PII is any piece of information which can potentially be used to uniquely identify, contact, or locate a single person, such as a Social Security number, email address, credit card number or fixed IP address. Web applications most often collect this type of information when a user either buys something from its Web site or registers to use the site's services. But this is not the only type of data that Web applications collect and store about their users. Products purchased, pages visited and advertisements clicked are just some of the many statistics often collected about a visitor. Although the majority of organizations do a good job of securing this data from attackers, users' privacy can be put at risk when the data is analyzed.

As an example, let's take a pharmaceutical company that sells drugs on the Internet. The marketing department may want to mine the collected user data in order to fashion a new advertising campaign. To prevent privacy breaches through data inference, it is critical that this data is anonymized prior to being analyzed. Data anonymization allows analysis to take place, but ensures that no sensitive information can be learned about a specific individual. The process is a lot harder than it may seem. Even a combination of non-personal data can be exploited to deduce who a record could belong to.

Using our example, even if the dataset given to the sales department has had individual customer names and email addresses removed, research shows that about half the U.S. population can be identified just from three pieces of information: date of birth, gender and place. If a zip code is available, the figure rises to 85%. Date of birth, gender and place would provide useful information for an advertising campaign, but taken together they could potentially enable a salesperson to re-associate a customer with their purchase records, causing what is called a re-identification disclosure. If these purchases were for drugs to treat a particular illness, the salesperson could deduce that the customer had a particular disease, resulting in a predictive disclosure and a breach of his or her privacy.

When analyzing Web application data, it is important that you take steps to anonymize it. The inclusion of any sensitive data should be carefully considered. Unfortunately, data anonymization is still really in its infancy. Disguising or hiding certain data in the original dataset can provide general privacy protection while still allowing reasonably accurate analysis. Instead of providing date of birth, for example, an alternative could be to use age groups. However, the only effective way to prevent disclosures like the one above is to remove analytically valuable information from the dataset. Finally, another important warning: when testing a new system, real customer data should never be used.

More information:

  • Deloitte and Touche's Russell Jones helps answer an enterprise's two biggest questions: Where is its data, and how is it handled?
  • Michael Cobb explains which tools can keep personally identifiable information (PII) out of access logs?


  • Sound Off! -   Be the first to post a message to Sound Off!


    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


    RELATED CONTENT
    Application Security
    Protecting exposed servers from Google hacks (and Google 'dorks')
    Which automated quality assurance tools can be used to test software?
    Has proof-of-concept mobile device malware translated into any meaningful attacks?
    How to test the security of personal details submitted to a website
    Is security improved when the number of Internet gateways is reduced?
    Are Internet cafe users' email credentials at risk?
    Which operating system can best secure an FTP site?
    Will firewall technology have to adapt to applications that use port 80?
    How secure is a mobile phone platform that has an open source framework?
    What ports should be opened and closed when IPsec filters are implemented?

    Web Application Security (Also see Web Access Control)
    Microsoft tools won't be quick fix for SQL injection attacks
    New defenses for automated SQL injection attacks
    HP aims at IBM with application vulnerability scanning as service
    Information security book excerpts and reviews
    Kaminsky on DNS rebinding attacks, hacking techniques
    Webmail security: Best practices for data protection
    Tracing malware's steps with RE:Trace
    SQL injection attack infects hundreds of thousands of websites
    PCI Council issues clarification on Web application security
    Web security gateways keep Web-based malware at bay

    Enterprise Data Protection
    Web 2.0 and e-discovery: Risks and countermeasures
    Screencast: Recovering lost data with WinHex
    Countermeasures against targeted attacks in the enterprise
    Websense, Reconnex top Forrester ranking of DLP vendors
    Are open recursive DNS servers inherently insecure?
    Penetration testing: Helping your compliance efforts
    Worst practices: Learning from bad security tips
    The ins and outs of database encryption
    RSA attendees see data classification, rights management projects stumble
    Worst practices: Encryption conniptions

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    anonymous Web surfing  (SearchSecurity.com)
    buffer overflow  (SearchSecurity.com)
    cache cramming  (SearchSecurity.com)
    cookie poisoning  (SearchSecurity.com)
    dictionary attack  (SearchSecurity.com)
    distributed denial-of-service attack  (SearchSecurity.com)
    JavaScript hijacking  (SearchSecurity.com)
    National Computer Security Center  (SearchSecurity.com)
    threat modeling  (SearchSecurity.com)
    trigraph  (SearchSecurity.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary



    Search and Browse the Expert Answer Center
    Search and browse more than 25,000 question and answer pairs from more than 250 TechTarget industry experts.
    Browse our Expert Advice

    TechTarget Security Media
    Information Security View this month\\'s issue and subscribe today.
    Information Security Decisions Apply online for free conference admission.
    SearchSecurity.com
    HomeNewsMagazineWebcastsWhite PapersLearningAdviceTopicsEventsAbout Us

    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




    All Rights Reserved, Copyright 2003 - 2008, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts