As large amounts of sensitive data continue to pour out of government agencies, enterprises and companies of every size, shape and description, help may be on the way from some unlikely sources.
Xerox Corp., of Stamford, Conn., on Monday announced a new technology, known as intelligent redaction, which enables users to selectively encrypt different portions of a given document for different recipients. The software is able to take into account the context of words or phrases in a document and determine when to redact the content and when to let it go.
The new technology could be a savior for financial services companies, health care providers and other organizations that must deal with confidential information on a mass scale. Employees in these organizations often face situations in which they're forced to either severely restrict the circulation of certain information or redact large amounts of it in order to comply with the lowest common denominator of access privileges. A tool such as intelligent redaction could allow for wider distribution of information with higher security at the same time.
Using the new software, the author of a document is able to identify sensitive portions of the text and then allow the software to encrypt or leave as plain text those portions, depending upon the reader. The system also identifies sensitive content, such as employee identification numbers, names or Social Security numbers, and then allows the author to select the text that does need to be redacted. The idea is to automate as much of the redaction process as possible, while still allowing users the final say in which recipients can read which parts of the document.
"There is some natural language processing in there, but humans are very much in the loop," said Jessica Staddon, the area manager of the security and privacy research group at Xerox's famed Palo Alto Research Centerm (PARC), which is leading the development of the new software. "I can write a rule and the apply it and see the effects of it and have the ability to tune it. You can apply rules at the word level or even at the sentence or paragraph levels."
In the process of developing the software, the PARC researchers went out and talked to potential users in a number of different fields, including lawyers, medical records clerks and others. What they found was that people handle the redaction process in different ways, depending upon their roles and the kind of information in question.
"We found that the people in the medical field often have to respond to subpoenas for medical records and there are some classes of information that they have to redact, like HIV status, any information on psychiatric care and things like that," said Staddon. "These organizations were maintaining manual lists of medications and other information that should be redacted. It was a very manual, laborious process. If our software was running locally on someone's PC, it could automatically generate the list of these terms that need to be redacted and help improve the speed and accuracy of the process."
The usage model is considerably different in a setting such as a law office, Staddon said. In most cases, the redaction process for legal documents is a collaborative effort that may involve a junior lawyer who does the initial review, a subject-matter expert and perhaps a senior partner. In that case, the intelligent redaction software might run on a central server instead of users' machines, she said.
The technology is still under development, and Staddon said that while it's possible it could be integrated into some Xerox offerings as early as next year, it's more likely that it will be a couple of years before it is ready for full release.