With the rapid expansion in variety and accessibility of military, commercial and open/free data sources, Raytheon’s customers are challenged to effectively leverage this information for detection, tracking and averting adversarial activity as part of daily Activity Based Intelligence (ABI) missions.
ABI is “…an analysis methodology which rapidly integrates data from multiple intelligence sources and sources around the interactions of people, events and activities, in order to discover relevant patterns, determine and identify change, and characterize those patterns to drive collection and create decision advantage.”1 Detecting, characterizing and monitoring Patterns of Life (PoL) is a critical ABI input. A PoL is best understood as a model, created through the analysis of entity, event and activity data which describes patterns in repeated activities, ongoing interactions or periodic changes in state. In military applications, patterns of life are useful for detecting anomalies, predicting future actions and helping to improve situational awareness from air, land, sea and space operating pictures. Today, practices within the Department of Defense (DoD) and Intelligence Community (IC) commonly include doctrine tuned to work on a single intelligence source, and are unable to process high volumes of data quickly enough to impact
real-time decision making.
Raytheon’s Intersect SentryTM capability is meeting the volume and velocity challenges of today’s data sources with services to detect, characterize and exploit patterns of life, at scale, in near real time. Leveraging big data machine learning for source fusion, target feature discovery and PoL modeling, it is able to make predictions about future target state. In order to form a complete common operating picture, Intersect Sentry combines customer data with open-source news data such as GDELT (Global Data on Events, Location and Tone); satellite imagery; video feeds of vehicle traffic flow; vessel track information from AIS (Automatic Identification System); air tracks from ADS-B (Automatic Dependent Surveillance – Broadcast); vehicle tracks from OpenStreetMapTM; public utility patterns (e.g. electricity, water); and weather station reports from NOAA® (National Oceanic and Atmospheric Administration). Together, these sources provide a nuanced story for how a target behaves in space-time, how targets behave at the aggregate level or how the activity at a location changes over time.
Intersect Sentry’s big data architecture gives the warfighter a decision advantage with real-time patterns of life, confidence metrics, and downstream exploitation for alerting and tipping. For example, today’s Air Force warfighter must manage thousands of potential threats to space assets per day, and the timeline for input into Space Situational Awareness systems is on the order of minutes across all of the orbital regimes. Currently, there are more than 4000 maneuverable objects, but with the rapidly growing constellation of SmallSats (Small Satellites), this number is projected to nearly double by 2022, making real-time patterns of life and predictive machine learning models critical decision aids for timely mission inputs. Marines are responsible for conducting such missions as enemy engagement, embassy protection, non-combatant evacuation and disaster relief. Mission planning for these efforts requires establishing and continuously updating normal baselines to quickly recognize, understand and, if necessary, mitigate anomalies in real time.
The Navy and Coast Guard are concerned with maritime domain awareness, including detection and interdiction of smuggling, illegal fishing and other nefarious activities. These challenges require the collection and fusion of open-source track and satellite information with Department of Defense (DoD) and Intelligence Community (IC) sources. Pattern of Life analysis of multi-source data is required to identify, track, characterize intent and predict future location or actions that would not be evident from single source analysis alone.
Through recent data partnerships, data acquisitions and existing proprietary sensor collection, Raytheon has unprecedented access to multi-INT (multiple intelligence) geo-temporal data sources, where the nature of the data is such that the read and update rate is beyond human comprehension and sensemaking abilities. For example: Twitter data has up to 330 million users and produces more than 500 million tweets/day; the Automatic Identification System (AIS) monitors more than 500 thousand vessels with roughly 150 million reports/month; and the Global Database of Events, Language and Tone (GDELT) generated a nearly 2.5 trillion node graph of new events in 2017 alone.
As evidenced by recent broad agency announcements (BAA), requests for information (RFI) and requests for proposal (RFP), customers are seeking full exploitation of both commercial and military data assets for near real-time forecasts of target maneuvers, anomaly detection and activity assessments. Across military branches and the intelligence community, the new reality is that digital footprints and the resulting patterns reveal adversarial intent when leveraged in a timely and comprehensive way. Raytheon has made a significant investment in developing the necessary machine learning algorithms, products and systems to automate the creation and use of Patterns of Life at a scale to meet demands of both the IC and DoD. These real-time patterns provide
Raytheon customers with the actionable intelligence they need to monitor adversaries, coordinate direct action forces and provide mission planning or collection tasking inputs in an increasingly complex and dynamic environment.
BIG DATA PATTERNS OF LIFE ON APACHE SPARK®
Intersect Sentry’s Pattern of Life capabilities are part of a real-time multi-INT big data analytics ecosystem illustrated in Figure 1. The Pattern of Life models are built forensically from multi-INT time series observation data processed by Apache Spark2 to extract patterns and generate predictive machine learning models. The resulting analytic products are pushed to a distributed Object Store and then queried by automated analytic agents to detect activity in near real time, assess potential anomalous conditions and make predictions about future activity.
With the volume and velocity of data coming from multi-INT sources, big data solutions are needed that scale appropriately based on the system load. Parallelizing updates to machine learning models and statistical summary data becomes increasingly important as the number of target entities grow beyond human operator capacity. Further incremental updates to global and local analytic products triggered by data source updates can quickly exceed the boundaries of performance for big server systems. Three specific big data capable machine learning products from Intersect Sentry’s Analytic Suite are presented in the remaining paragraphs along with their application across the air, sea, ground and space domains.
GEOSTAT: BIG DATA, NEAR REAL-TIME SITUATIONAL AWARENESS
Intersect Sentry’s GEOSTAT forensic analytic service continuously processes geospatial observation data to learn and statistically characterize global and regional patterns of life. These patterns are useful to define the historic norms of an area of interest for real-time anomaly detection and event prediction.
In the maritime, domain observations include vessel position data reported from the Automatic Identification System (AIS), a rich source of information about the speed, location and direction of travel for more than 500,000 vessels around the world. At a global scale, the GEOSTAT service has processed up to a year of maritime data to perform statistical analysis across all oceans and waterways at varying geospatial and temporal scales. Global, regional, and local patterns are characterized with associated probabilities enabling deduction of entity class, activity, and destination. Deviations and temporal changes to patterns of life are also analyzed.
In Figure 2, the observation frequency and density of vessels observed from AIS are visualized as a fluctuating heat map for the English Channel. To create the heat map, the area of interest is first divided into small cells, then the frequency is determined by the number of unique vessels in each cell, and the density is calculated as the number of vessels per unit area for the cell. In addition, probability distributions are calculated to determine the direction of travel and speed observed in each cell revealing patterns of activity such as routes and shipping lanes (Figure 3). This same area of interest can also be overlayed with flags (Figure 4), showing which country’s ships are predominant in the various routes and regions.
BIG DATA, ENTITY PATTERNS OF LIFE SERVICE
To complement the aggregate patterns of activity, the PoL service analyzes observations at the entity level to generate Patterns of Life for individual actors. The goal of this analytic is to learn a probabilistic function to capture and quantify any recurring behavior in each entity state needed for downstream predictions and anomaly detection.
The PoL service processed AIS data to generate profiles for each entity based on their position and speed reports. A profile includes the most likely locations (hangouts) for an entity, the revisit rate for that location, and the typical speed of the entity at that location. Figure 5 shows statistics generated by the service for the most common elapsed time between vessel detections and the vessel behavior at that location based on the reported speed.
CLUSTER ENTITY GRAPH SERVICE
Clustering is an unsupervised machine learning approach which organizes similar data objects into groups (or clusters). The clustering service consists of machine learning clustering algorithms that enable fast updates when entities change, or new entities are added or deleted. They do not require the number of clusters to be specified, only a similarity threshold, and entities are allowed to have membership in more than one cluster. If overlap exists in the clustering results, it is encoded as a weighted graph to uncover the interconnectedness of the entity set.
Clustering properties are further correlated with additional metadata about the entities in order to statistically label the cluster. As the clusters evolve, the service detects changes between the previous and current state, such as merged clusters, split clusters, new clusters, and missing clusters and generates operator alerts.
Space Situational awareness requires continuously tracking and monitoring all space objects across all orbit regimes to keep space assets safe from both adversaries and debris. Two Line Elements (TLEs) are data records containing identification and the latest orbital parameters for a satellite. The service processes publicly available TLEs from the Joint Space Operations Center for all active Low Earth Orbit (LEO) space objects. Figure 6 shows the clusters discovered for the LEO satellite payloads and the connections between them. Red nodes represent the satellites, green nodes are the clusters. LEO satellites clusters are formed based on orbital characteristics of the satellites, grouping satellites with similar orbits together. Edges (connecting lines) between red and green nodes indicate membership within that cluster; edges between green nodes are derived associations created when two clusters share the same member.
The clustering properties consist of the orbital parameters for each satellite. The entity metadata used to label the clusters included country, users, ground stations and launch sites. By correlating the orbital parameters with the entity attributes, properties of newly launched space objects can be inferred from their orbital parameter state.
Raytheon is at the forefront of pattern of life discovery, detection, tracking and prediction due to its extensive investment in data, customer relationships, big data infrastructure and machine learning analytic development. Future efforts will incorporate Recurrent Neural Networks (RNN) for time series specific predictions and deployment to the cloud to ensure horizontal scaling as data volume and velocity continue to increase. RNNs are able to capture and encode complex time series feature representations that outperform systems that encode temporal windows directly in the feature space. In addition to computational scaling, implementation within the cloud enables easier access to analytic resources, machine learning models and their outputs.
– Christine Nezda
1Atwood, Chandler P., Activity-Based Intelligence: Revolutionizing Military Intelligence Analysis, Joint Force Quarterly 77 (2nd Quarter, April 2015), National Defense University Press.
2Apache Spark is a unified analytics engine for large-scale data processing (https://spark.apache.org)