Information Systems and Computing
Connecting the Dots to Enable Information Operations
Information Operations (IO)
IO is the integrated use of electronic warfare, computer network operations, psychological operations, military deception and operations security.1
IO creates huge amounts of disparate data, each piece of which, by itself, may not be particularly meaningful. These data must be parsed, understood, fused and analyzed before a picture that can be acted upon can emerge from the disconnected dots. Currently, the military can process only 20 percent of the available sensor data.2 To realize the potential of emerging IO technologies, significant improvements in data processing and analysis are needed. Knowledge exploitation (Kx) is an emerging Raytheon capability that addresses these issues.
Kx integrates elements of four complementary technologies: knowledge management (KM), information fusion (IF), knowledge discovery (KD) and semantic processing. Knowledge management addresses the effective organization and retrieval of source material and data streams. Information fusion melds related data to eliminate redundancy, reduce uncertainty, provide situation awareness, and enable effective decision-making and resource allocation. Knowledge discovery techniques find non-obvious relationships, patterns and trends buried within the mounds of data; and semantic processing establishes the meaning of data.3 This article shows how these knowledge exploitation technologies can be integrated to help "connect the dots" to enable IO, with a specific emphasis on semantic processing and knowledge discovery.
Semantic Processing and Knowledge Discovery
IO involves diverse entities such as people, computers, networks, software, infrastructure and organizations as well as more abstract concepts such as effects, vulnerabilities and cultural biases. Unlike relational databases (RDBs), which are very good for storing many instances of similar, well-structured data, semantic processing easily captures and manipulates many diverse concepts and how they relate to each other. Semantic processing is thus well suited to represent and manipulate the concepts in the IO domain.
The starting point for capturing knowledge in a semantic system is to describe the framework of the problem domain; in our case, IO. The framework is captured in a knowledge model4 that consists of three parts: concepts, relationships and rules. The concepts and relationships5 can be represented together as a concept map such as that in Figure 1, which depicts a part of the IO domain. Here, concepts are represented as nodes of a graph and their relationships as annotated edges.
Knowledge extractors are used to convert information from data sources to specific instances of the defined concepts and relationships in the knowledge model. These facts are called assertions. Knowledge extractors can operate on sensor data, RDBs, Web pages or unstructured text, and they allow us to capture a rich set of assertions about the IO domain.
Each assertion specifies a relationship between two entities — the subject entity and the object entity — which are specific instances of the concepts in the knowledge model. An assertion can be thought of as the node – edge – node construct of a graph corresponding to the subject – relationship – object data pattern. For example, the graph of Figure 2 would be constructed from independent observations and would consist of specific instances of the concepts defined in Figure 1.
Constructing this graph from individual assertions requires us to recognize when two instances of an entity, possibly reported by different sources, represent the same entity. This process, which links individual assertions, is a specific form of information fusion6 called object refinement. Object refinement can be implemented by the third element of the knowledge model, rules.
Rules can also be used to infer additional facts and patterns in the graph and can identify situations that need to be acted upon. For example, appropriate rules applied to the above graph would reasonably generate an alert that country ZZZ could be exfiltrating sensitive defense information.
Because we are representing knowledge in the form of a graph, we can employ graph algorithms, in addition to rules, to discover patterns and relationships of interest. Many important non-obvious relationships often appear as multi-hop links chained through several nodes in the graph. Graph algorithms can be used to discover these graph paths and assert the non-obvious relationships between nodes. Our Raytheon IIS partners are developing special-purpose high-speed graph processes that will enable us to efficiently implement knowledge-discovery algorithms on huge graphs.
In addition to the graph representation of knowledge, it is also common to capture assertions as triples in the form subject – relationship – object. The Resource Description Framework (RDF),7 Web Ontology Language (OWL),8 and the Semantic Protocol and RDF Query Language (SPARQL) are semantic Web standards that can be used to express a knowledge model as triples, enable reasoning to be performed by commercial off-the-shelf (COTS) reasoning engines, and provide a query/rule language for the model. These standards enable inferences and rule-based reasoning to be done, complementing graph exploitation algorithms to discover and infer new knowledge.
RDF triples and knowledge graphs are different approaches to knowledge representation. Knowledge discovery can be done using the representation most likely to perform best for a particular problem. Current COTS RDF triple stores have limited storage capacity and reasoning performance. Dedicated, high-speed graph processors, such as those under development by Raytheon, will provide the high-speed reasoning, on huge data stores, required to address the IO and similar problems.
1 Joint Publication 3-13, Information Operations, 13 February 2006.
2 Al Shaffer, principal deputy director, Defense Research and Engineering.
3 Semantic processing also establishes common shared meaning, which enables interoperability.
4 One common form of a knowledge model, called an ontology, uses formal description logic to express the semantics of a term.
5 In general, concepts can have attributes. For example, if the concept is a person, it is useful to capture attributes such a name, address, and date of birth. In many applications, it is also important to assign attributes to edges, such as the time an observation is valid.
6 Object refinement recognizes when two or more nodes represent the same entity and combines them, eliminating duplicates and reducing uncertainty. Object refinement is also called Level 1 fusion in the Joint Directors of Laboratories (JDL) fusion framework. This is the most widely accepted model of information fusion. See Revisiting the JDL Data Fusion Model II, James Llinas, Christopher Bowman, Galina Rogova, Alan Steinberg, Ed Waltz and Frank White, 2004, for a discussion of the JDL model and refinements.
7 See http://www.w3.org/RDF for an overview of RDF.
8 See http://www.w3.org/TR/owl - features for an overview of OWL.
Author: Jean Greenawalt
Contributors: Don Kretz, Jim Jacobs, John Montgomery, John Moon, Tom Chung