Manage Learn to apply best practices and optimize your operations.

Should confidential data be indexed or used as the index key?

A recent attack uses a series of insert operations to find weaknesses in the database's indexing algorithm. Michael Cobb explains the nature of the threat and what it means for customer data.

I've read that confidential database data shouldn't be indexed or used as the index key. What does that mean, and what best practices should I employ to ensure that this isn't a problem in my organization?
Database indexes are much like indexes in text books, as they provide quick reference points on where to find requested data. They reduce database server efforts and speed up data retrieval times. In a relational database, every table should have an indexed primary key whose sole purpose is to create a well-defined link and distinctive value between records in the database. In order to ensure that the technical implementation of the database is separate from the business logic, this primary key value should not have any real-life significance.

A table of a bank's customers, for example, may well have a column for storing each customer's unique bank account number – a possible candidate for a primary key. The primary key's value distinguishes each row of customer data.

To speed up the retrieval of customer data, the bank account number or the Social Security number of each customer, for example, can be indexed. The arrangement allows bank staff to quickly search the database using that particular piece of information. These indexes, however, are the focus of a new timing attack technique demonstrated by researchers from Core Security Technologies. The attack uses a series of insert operations to find weaknesses in the database's indexing algorithm. Attackers can then extract data from indexed fields. The insertion commands do not exploit any application logic or code flaws; the functions are typically available to all database users.

The initial defensive recommendation is to not use indexes on confidential data. Without indexes, however, data retrieval is complex. To find the particular row matching a given bank account or Social Security number, the database server would have to perform a full table scan to search every row in the customers' table. Complex queries across multiple tables also depend heavily on indexes. These delays would have a significant impact on performance and cripple most large commercial databases.

While there are no reports of this attack being used in the wild, it is a plausible threat. Database administrators should monitor log files more closely to look for abnormal repetitive insert activity. Application firewalls will also need to be tuned to detect unusual patterns of activity. For new databases, architects must make some modifications to the data model and application code. For each column in a table that must be indexed, there must now be a corresponding column to store the hash value of the confidential data. This hash value can then be used for indexing. The attacker will not be able to calculate the value of confidential data from it, effectively negating the attack. Applications can still search for the confidential data efficiently by performing the search on the indexed hash value column and passing the hashed value of the data as the search criteria.

More information:

  • James Foster demystifies database compliance.
  • Visit SearchSecurity.com's Data Protection School.
  • This was last published in October 2007

    Dig Deeper on Data security strategies and governance

    Have a question for an expert?

    Please add a title for your question

    Get answers from a TechTarget expert on whatever's puzzling you.

    You will be able to add details on the next page.

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.