Friday, August 31, 2007

Data Mining:Digging Deep To Thwart Terrorism

The use of data mining reportedly helped unmask a terrorist leader months before 9/11, but there are concerns about coordination and privacy

26 Terabytes of Data

The Navy mines large volumes of data each day, but converting it into intelligence is still the work of human analysts.

* New software tools cannot determine the significance of data.

* An executive office to foster coordination among data mining programs could be helpful.

* Coming soon: Project Rockwell will plumb the depths of news reports.

Recent reports by The New York Times and Fox News that the Pentagon identified 9/11 ring-leader Mohammed Atta as part of a U.S.-based terrorist cell months prior to the attacks on Washington and New York have sparked new interest - and controversy - about the Defense Department's relatively nascent abilities to assess huge volumes of data for patterns of behavior that are indicative of terrorists and their activities.

According to press reports, Atta was identified in early 2000 by several military officers, including Navy Capt. Scott J. Phillpott, who managed a Pentagon program called "Able Danger" that employed an analytical process called "data mining." The process allows intelligence analysts armed with specially designed software to aggregate multiple data sources, such as lists of terrorists and decades of reporting by the Associated Press, and search for specific patterns of behavior, anomalies and relationships. The findings become the basis for refined analyses by intelligence specialists.

The New York Times reported in August that Defense Department lawyers forced three meetings to be canceled where military officials involved with "Able Danger" were to report Atta's name to the FBI after the program identified him. These claims have not been confirmed by the Pentagon.

U.S. Rep. Curt Weldon, R-Pa., who arranged a meeting between the news agencies and Phillpott, released a statement in late August describing the program's objective as "to identify and target al Qaeda on a global basis, and, through the use of cutting-edge technology ... to manipulate, degrade or destroy the global al Qaeda infrastructure."

After the public speculation about "Able Danger," the 9/11 Commission stated Aug. 12 that it had learned about the program in October 2003. Initial informants did not mention Atta or any other future highjackers. In July 2004, a different informant knowledgeable about "Able Danger" told the Commission he had seen Atta's name and photo in another analyst's notes. However, this informant was not able to substantiate that assertion to the satisfaction of the Commission, and "Able Danger" was not mentioned in the Commission's final report.

The alleged identification of Atta has attracted high-profile attention to the potential of data mining technologies and processes as intelligence tools. However, the usage and processes of data mining remain relatively immature in the military arena.

One official told Seapower that coordination of data-mining efforts and requirements between federal agencies should be much improved. Also, implementation and oversight issues remain a key challenge in balancing the use of data-mining tools with privacy concerns.

Data mining is not new. Industry has reaped benefits from it in sectors such as health care, insurance and banking. But the lack of coordination between government agencies sometimes creates barriers that prevent valuable intelligence from reaching the proper authorities.

At the forefront of acquisition and development of Navy data-mining tools are the Space and Naval Warfare Systems Command, the Naval Research Laboratory and the Office of Naval Intelligence (ONI). There is little to no coordination between these commands to acquire data-mining tools in concert, a Navy official said, adding that one of the biggest problems with Navy data-mining tools is the number of various commands working on acquiring these tools, "some of which overlap, and it's not always as well coordinated as it could be."

The official suggested establishing a maritime domain awareness program executive office as a means to "deconflict" some of the divergent acquisition of data-mining tools between commands, which leads to conflicts in data and hardships in comparing data sets. As put by David Munns and David,the Navy had no comment on the plausibility of this suggestion.

"There have been times where ONI needed information that existed in other agencies' data sources" and it was not available, the Navy official said. "It's certainly not seamless and it's not as well integrated as it could be. Today, there are still lots of places where things can fall through the cracks and where connections might not be made.

"For example, there is not a single source of, or a single list of, terrorists" that all intelligence commands share, the official said. "If someone boards a ship in the Mediterranean and gets a crew list of people who are on that ship and that ship's en route to the United States, we can take that crew list but we have to run it against multiple lists to see if anybody who's on that ship pops up as a bad guy. ... It could be easy to not check against somebody's database."

ONI shares a working relationship with Naval Networks Commander Vice Adm. James McArthur, who wears a lesser-known hat as the assistant chief of naval operations for Information Technology. McArthur's office provides oversight and guidance to validate ONI's information technology spending on tools such as data mining.

McArthur's office was reluctant to discuss these tools because of the "Able Danger" controversy, citing their immaturity and the relative lack of "concrete" examples of how they can be used successfully, according to a Navy spokesperson.

Several experts told Seapower that data mining is destined to be a valuable asset in the war on terror, but should be viewed as a capability with advantages and limitations rather than a cure-all for the nation's growing intelligence requirements.

Jeffrey W. Seifert, an analyst in information science and technology policy for the Resources, Science and Industry division of the Congressional Research Service, released an overview of data mining last December. The report points to a limitation in data mining as being unable to determine the value or significance of intelligence. It also mentions an inability of data-mining tools to determine causal relationships.

"For example, an application may identify that a pattern of behavior, such as the propensity to purchase airline tickets just shortly before a flight is scheduled to depart, is related to characteristics such as income, level of education and Internet use. However, that does not necessarily indicate that the ticket purchasing behavior is caused by one or more of these variables," the report states.

Regardless of the particular data-mining tool or its limitations, the first step in data mining is to concentrate data into a single, normalized architecture or data model. That can be done physically, by actually moving all the data into a common disk form, or "disk warehouse," so it can then be digested to resolve ambiguities, or the sorting can be done automatically by a computer. For example, if one set of data is recorded in meters and one is recorded in feet, then the data-mining process would initially make a conversion so that when the actual tools are run against the data set a consistent outcome would be produced. Once data is normalized, the tools scan through it and create a statistical model.

Data-mining tools look through the existing data and identify patterns. From those patterns, anomalies, or out-of-place data patterns, are recognized and then analyzed. One notable outcome from the analysis of these patterns is the ability to make predictions about what is missing in the data, or what elements of data are not included.

This, however, is an extremely difficult task when working with 26 terabytes of active data on a daily basis, an amount that would fill up about 85 high-end 300 gigabyte hard drives each day. This quantity of information being processed by the Navy is also growing at a rate of 10 percent per year, according to ONI.

Nonetheless, data mining is an asset to government agencies that have taken on new roles in the aftermath of 9/11.


A new interest of the Navy and other government agencies is to track the movement of more than 130,000 commercial vessels and the 17 million cargo containers they carry, which could be used by terrorists as a means of attack against U.S. ports, or to smuggle arms or people into the country. ONI looks at transit plans, bills of lading, intelligence reports, and years of reporting by internal analysts and news agencies to identify vulnerabilities or suspicious activity within the shipping industry. Today, the Navy is shifting its focus from the ships themselves to terrorist use of the commercial shipping network, according to a Navy source.

"Many of the problems that we're looking at in the commercial shipping industry are very much analogous to fraud detection; we want to track norms and we want to identify things that are outside of the norm," said the Navy official.

There are typically 10,000 messages on an analyst's desk at ONI every morning. One tool ONI has been exploring, and is deploying this fall to approximately three-dozen workstations, is Project Rockwell. Derived from another agency and an industry partner, Project Rockwell allows analysts to go through open wire news feeds, such as Reuters or the Associated Press, and run queries against the feeds in the areas that they have highlighted.

If there is a subject an analyst has particular interest in, they can highlight it, and pertinent information will be color-coded on their desktop. For example, if there is a topic of concern that normally has one news-feed pertaining to it and suddenly there are hundreds of feeds, Project Rockwell brings that information to the analyst's attention and directs them to that topic or subject of interest.

"What it allows them do is go through the thousands of messages that they would get normally in a day and does it four times faster," said the Navy official. "That's not taking the man out of the loop, but it's certainly freeing up the man to do more analysis and less data sorting and initial review."

In the homeland security realm, there are some legal privacy constraints, not necessarily restrictions, on sharing information outside of Department of Defense boundaries, depending on what that information is. Intelligence commands, for example, have limitations on how and how long they can retain information on U.S. persons or companies.

"What we're hoping to build is a capability that, if we can't keep the data, will allow us to connect the data that might be held by the FBI or by the U.S. Coast Guard, as examples of law enforcement agencies, so they can easily extract value from our data," said the official.


No comments: