carloscastilla - Fotolia
Once again, Department of Defense data was found publicly exposed in cloud storage, but it is unclear how sensitive the data may be.
Chris Vickery, cyber-risk analyst at UpGuard, based in Mountain View, Calif., found the exposed data in publicly accessible Amazon Web Services Simple Storage Service (S3) buckets. This is the second time Vickery found exposed data from the Department of Defense (DOD) on AWS. The previous exposure was blamed on government contractor Booz Allen Hamilton.
UpGuard said a now-defunct private-sector government contractor named VendorX appeared to be responsible for building this database. However, it is unclear if VendorX was responsible for exposing the data. Vickery also previously found exposed data in AWS buckets from the Republican National Committee, World Wrestling Entertainment, Verizon and Dow Jones & Co.
According to Dan O'Sullivan, cyber-resilience analyst at UpGuard, Vickery found three publicly accessible DOD buckets on Sept. 6, 2017.
"The buckets' AWS subdomain names -- 'centcom-backup,' 'centcom-archive' and 'pacom-archive' -- provide an immediate indication of the data repositories' significance," O'Sullivan wrote in a blog post. "CENTCOM refers to the U.S. Central Command, based in Tampa, Fla., and [is] responsible for U.S. military operations from East Africa to Central Asia, including the Iraq and Afghan Wars. PACOM is the U.S. Pacific Command, headquartered in Aiea, Hawaii, and [covers] East, South and Southeast Asia, as well as Australia and Pacific Oceania."
UpGuard estimated the total exposed data in the AWS buckets amounted to "at least 1.8 billion posts of scraped internet content over the past eight years." The exposed data was all scraped from public sources, including news sites, comment sections, web forums and social media.
"While a cursory examination of the data reveals loose correlations of some of the scraped data to regional U.S. security concerns, such as with posts concerning Iraqi and Pakistani politics, the apparently benign nature of the vast number of captured global posts, as well as the origination of many of them from within the U.S., raises serious concerns about the extent and legality of known Pentagon surveillance against U.S. citizens," O'Sullivan wrote. "In addition, it remains unclear why and for what reasons the data was accumulated, presenting the overwhelming likelihood that the majority of posts captured originate from law-abiding civilians across the world."
Importance of the exposed DOD data
Vickery found references in the exposed data to the U.S. Army "Coral Reef" intelligence analysis program, which is designed "to better understand relationships between persons of interest," but UpGuard ultimately would not speculate on why the DOD gathered the data.
Ben JohnsonCTO at Obsidian Security
Ben Johnson, CTO at Obsidian Security in Newport Beach, Calif., said such a massive data store could be very valuable if processed properly.
"Data often provides more intelligence than initially accessed, so while this information was previously publicly available, adversaries may be able to ascertain various insights they didn't previously [have]," Johnson told SearchSecurity. "What's more of a problem than the data itself in this case is that this is occurring at all -- showcasing that there's plenty of work to do in safeguarding our information."
Rebecca Herold, president of Privacy Professor, noted that just because the DOD collected public data doesn't necessarily mean the exposed data includes accurate information.
"Sources of, and reliability for, the information matters greatly. Ease of modifying even a few small details within a large amount of data can completely change the reality of the topic being discussed. Those finding this information need to take great caution to not simply assume the information is all valid and accurate," Herold told SearchSecurity. "Much of this data could have been manufactured and used for testing, and much of it may have been used to lure attention, as a type of honeypot, and may contain a great amount of false information."
Herold added that the exposed data had worrying privacy implications.
"Just because the information was publicly available does not mean that it should have been publicly available. Perhaps some of this information also ended up being mistakenly made publicly available because of errors in configurations of storage servers, or of website errors," Herold said. "When we have organizations purposefully taking actions to collect and inappropriately -- though legally in many instances -- use, share and sell personal information, and then that information is combined with all this freely available huge repositories of data, it can provide deep insights and revelations for specific groups and individuals that could dramatically harm a wide range of aspects within their lives."