Recent advances in computer technology are enabling scientists and engineers to solve more complex problems with Machine Learning (ML). 

Significant leaps in processing power, bit depth, caching, and storage along with expansion of cloud based services, provide researchers virtually unlimited scaling of resources, bringing problems having millions of input attributes and thousands of potentially non-exclusive output classes within reach. One example is convolutional neural networks (CNNs). A CNN is a form of deep learning neural network often used in image processing, such as the GoogleTM 50-layer ResNet network, which contains more than 20 million computational nodes and was trained on a database of over one million images.1 New tools, such as Python®, TensorFlowTM and Matlab®’s Artificial Intelligence (AI) tool box, have opened the door for non-machine learning scientists to build complex networks to solve a wider breadth of problems previously unexplored in the computational machine learning community.

Raytheon has leveraged both academic network models and commercially available datasets in its research of computational pattern recognition. One basic pattern recognition application is detecting objects. The difficulty is often not the complexity of the object itself, but the sheer volume of source data which must be analyzed. Time is required to acquire data, label examples, and then train the models using the labeled data. Consider the case of an image of an airport where a model is required to detect the number of aircraft on the ground. A recent Raytheon training experiment used 100,000 image chips of airplanes and approximately 100,000 background image chips. Image chips were a single band panchromatic containing 300×300 pixels. Using an Amazon Web Services (AWSTM) G2.8 virtual machine (VM), batch training 32 images at a time for 10 weeks achieved an 80% probability of detection Pd at 10E-5 false alarm rate (FAR). 

Within the defense industry, objects of interest and their images often pose limitations that are not addressed by commercial applications. The ability to rapidly analyze unique, camouflaged, fleeting and possibly threatening entities is increasingly important. To build and train representative models for these cases requires that training data cover the breadth of available object/image variations, environmental parameters, and characteristics of the sensors observing them (Figure 1). Typically, Raytheon’s customers’ data provides observations of limited instance; data collected by individual sensors, across a common background, and with similar viewing geometries.2 Consequently, data are locally sparse, yielding training databases effectively smaller than nominal training sizes, which can then cause bias in the resulting models. Overcoming these limitations is a key focus area of Raytheon’s Machine Learning research and several approaches to the problem are discussed in the following sections. 

Figure 1: Object, Sensor and Environmental variations


Three commonly used ML training algorithms are supervised, unsupervised and semi-supervised. Supervised training typically requires large amounts of input labeled data, where each training exemplar is tagged with a known output class. Discriminant functions, generated by modeling this mapping are then used to assign (or infer) classes to new, unlabeled input examples. Unsupervised training consists of unlabeled exemplars, not tagged with a known output class, and the discriminant functions must learn how the data clusters into classes. During operations, unlabeled data are assigned to the nearest data cluster. Unsupervised learning can bias the outcomes by generating clusters during training that are not representative of specific target classes. Semi-supervised training utilizes both labeled and unlabeled exemplars. While unlabeled data help estimate data distribution, reducing the overall need for labeled data, labeled data are still required for class separation. Semi-supervised approaches are the least likely biasing training strategy as the cluster statistics are drawn predominantly by unlabeled data.


Raytheon has developed a number of tools and approaches to maximize training efficiency and mitigate the effects of limited training exemplars, bad labels, and noisy data. These approaches include training with a mix of both labeled and unlabeled data, the use of generative adversarial networks (GANs) to train more effectively with limited data, and the generation of quantifiable evaluation strategies and metrics for assessing performance.


One approach to mitigating ML sample size requirements is to integrate Generative Adversarial Networks (GANs) into the training process.3 Classically, for detection and classification problems, ML techniques focused on discriminative models which generate a mapping from input attributes to output classes. Less attention has been paid to generative models that learn the joint probability between a set of input attributes and output classes.

GANs utilize what is best described as a two-player game between a discriminator network and a generator network. Iteratively, the generator creates synthetic examples and the discriminator decides whether these examples are real or fake (Figure 2). The generator creates the fake examples by transforming a noise source into synthetic data. As the game continues, the generator learns to produce more realistic examples and the discriminator improves its ability to separate real from fake examples. The generator is optimized by mapping the noise signal onto the training data and the discriminator is optimized by how well it correctly detects or classifies both the real and the synthetic data. During the process, the generator is learning the training data distribution. Ideally, the system is optimized when the discriminator is only 50% confident that the generator’s examples are fake.

Figure 2: Generative Adversarial Network (GAN) Workflow

Initially, Goodfellow’s GAN experiments were replicated to prove their reduction in the training burden relative to deep learning neural networks without the GAN process.4 In addition to reproducing results for reducing training burden for detection, or two class, problems, the GAN domain was expanded to classification problems of more than two classes. Figure 3 displays a series of experiments with the discriminator alone (blue diamonds and gray squares) and with a GAN processor (green triangles) using different network sizes. All of the experiments provided statistically similar results and used feedforward backpropagation neural networks. The X axis represents the size of the network as network connections and the Y axis is the number of training iterations. Since performance was similar for all the experiments, the GANs trained in approximately 10 times fewer iterations.

Figure 3: Significant reduction in training burden for GANs versus non-GANs


Another approach to reduce the required amount of labeled training data is using unlabeled data with a semi-supervised learning algorithm. In many cases, unlabeled data is plentiful, even though labeled data may be scarce. Semi-supervised learning algorithms can utilize partially labeled datasets, alleviating the heavy requirement for large amounts of labeled training data. The Raytheon Machine Learning Team is currently investigating the use of pseudo-labels, labels that are created automatically for unlabeled data using a partially trained network. At first glance, this approach may seem like learning what is already known. In other words, if a network exists that can correctly label the images, then we are already done. Alternatively, if the network generates erroneous pseudo-labels then how can these help to improve the network, since they contain precisely the same mistakes that the network would already make? Yet, surprisingly, pseudo-labels can significantly improve classification accuracy and unlike other semi-supervised approaches, they are extremely simple to implement as they do not require any changes to the network architecture. 

Pseudo-labeling is based on a theory known as Entropy Regularization that assumes data points exist in clusters, high-density pockets in some feature space. The pseudo-labels are created by a type of clustering algorithm, locating the decision boundaries that separate these high-density clusters. Consider a small set of labeled data points for two classes in a 2D feature space as shown in top plot of Figure 4. The current decision boundary of the network, determined from sparsely labeled data, is shown as a dashed line. The middle plot includes additional points from our unlabeled dataset. The centers of each point are shaded according to the unknown true class label and the outlines of the points are colored according to the current prediction of the network (pseudo-label). The training algorithm adjusts the weights of the network to accommodate as much as possible all of the data point assignments. With a high density of points located in distinct clusters, this will move the decision boundary towards the true boundary shown in the plot on the bottom. At each step, pseudo-labels are re-evaluated, allowing them to flip to correct assignments as the network discovers clusters in the unlabeled data consistent with the labeled data.

Figure 4: top: Sparse labeled data has ambiguous class boundaries, middle: Unlabeled data with pseudo-labels (outlines) added to dataset, bottom: Re-training with pseudo-labels corrects decision boundaries based on data population density.

Historically, pseudo-labels have only been applied to unlabeled data. However, Raytheon has developed an informed pseudo-label algorithm that takes into account noisy labels based on a known or estimated probability of correctness. While extremely noisy labels have limited use with most supervised methods, we demonstrate that a high percentage of label errors may be tolerated using this approach. Using the MNIST (Modified National Institute of Standards and Technology) hand-written digit dataset, our method achieves greater than 98% accuracy even if 70% of the labels are chosen at random, and more than 95% accuracy if 90% of the labels are chosen at random. These results are competitive with recently published works.5,6

When the proportion of labeled to unlabeled data is small, pseudo-labels can become easily unbalanced, resulting in all of the points being assigned to only a few or even one class. To counter this, previous approaches restricted the amount of unlabeled data that was used, which limits the amount of information available to the algorithm and to an extent, performance. Raytheon ML scientists took a different approach, estimating the percentage of unlabeled data points in each class a priori, and enforcing a more even split during the assignment of pseudo-labels. As shown in Figure 5, using this approach with the MNIST dataset compares favorably to previous published studies. In the figure, average error results from the Raytheon experiment with a convolutional neural network (CNN) and the same network using pseudo-labels (CNN+PL) are compared to a similar previous study by Lee using a neural network (NN) and neural network with pseudo-labels (NN+PL).7 

Figure 5: Average error rate ( %) results for the MNIST dataset of hand-written digits using only a small number of labeled data points. Individual values represent the average error over 10 training trials, each
with a different random labeled subset of MNIST. CNN+PL is the CNN trained with use of Raytheon’s balanced pseudo-label algorithm for the remaining ” unlabeled” training data.


When failures occur in machine object recognition algorithms, researchers often have limited information on the root causes of the failure. For example, did an algorithm fail to detect an object due to occlusion, shadow, contrast, or other known computer vision shortcoming? Was the training data not representative of the test data? Is the algorithm fundamentally flawed? Modern ML algorithms like deep neural networks are particularly opaque and provide little information to ML engineers and analysts (Figure 6). Along with the tremendous benefits of the use of AI and ML throughout industry, comes the need for greater empirical insight into how these algorithms are performing. 

Figure 6: Example results from ML algorithm 8,9

A primary underlying assumption of ML is that the relevant characteristic parameters of training data are representative of the data the algorithm will be tested with and required to discriminate in operation. Raytheon is developing a data-driven statistical confidence metric that will provide insight into some fundamental questions about ML, such as what is the accuracy of a model’s prediction given a classifier and sample of data, or which characteristics of the training data have the greatest influence on a model’s accurate prediction of Sample X.

While results on many classification problems have been impressive, Raytheon’s customers often have limited training data available for many applications yet require metrics that attest to the veracity of an algorithm’s results. Having an innovative confidence metric based on examining scene and target statistics will provide insight to a model’s applicability to a specific test sample.

Many challenges remain for the integration of ML into the systems and technologies of the defense industry, not the least of which is the ability to rapidly apply ML capabilities in dynamic environments of limited training data. Raytheon has made significant investment in this area through the Independent Research & Development (IR&D) of the approaches discussed in this article, and experimental results suggest that these techniques can provide meaningful performance improvements with customer datasets. Other methods under consideration include variational autoencoders; triplet loss; and image-to-image translation,10 where DNN’s attempt to learn the values of one modality from another, such as Electro-optical (EO) from Synthetic Aperture Radar (SAR).

The ability to effectively utilize machine learning algorithms in Raytheon products offers tremendous advantages to customer missions. Both machine learning and artificial intelligence remain a strong focus in the company’s technology roadmap as key capabilities for today’s defense systems.

– Steven A. Israel
– Philip Sallee
– Franklin Tanner
– Jon Goldstein
– Shane Zabel


 1 Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel and Kurt Keutzer, “ImageNet Training in Minutes,” arXiv:1709.05011v10 [cs.CV] 31 Jan 2018.

2  S. Israel and E. Blasch, “Chapter 5: Context Assumptions for Threat Assessment Systems,” in Context-Enhanced Information Fusion: Boosting Real-World Performance with Domain Knowledge, L. G. J. L. J. a. B. E. Snidaro, Ed., Springer International, 2016, pp. 99-124.

3  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, “Generative Adversarial Networks,” no. arXiv:1406.2661v1, p. 9 pages, 2014.

4  S. Israel, J. Golstein, J. Klein, J. Talamonti, F. Tanner, S. Zabel, P. Salle and L. McCoy, “Generative Adversarial Networks for Classification,” in IEEE® Applied Imagery and Pattern Recognition Workshop: Big Data, Analytics, and Beyond, Washington, 2017.

5 I. Jindal, M. Nokleby and X. Chen, “Learning Deep Networks from Noisy Labels with Dropout Regularization,” in IEEE International Conference on Data Mining, 2016.

6 D. Rolinick, A. Veit, S. Belongi and N. Shavit, “Deep Learning is Robust to Massive Label Noise,” 31 05 2017. [Online]. Available: [Accessed 27 01 2018].

7 D. Lee, “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Workshop on Challenges in Representation Learning, ICML, 2013.

8  F. Tanner, B. Colder, C. Pullen, D. Heagy, M. Eppolito, V. Carlan, O. C. and P. Sallee, “Overhead Imagery Research Data Set – An annotated data library and tools to aid in the development of computer vision algorithms,” in IEEE Applied Imagery Pattern Recognition Workshop, Washington D.C., 2009.

9 Images from U.S. Geological Survey Department of the Interior/USGS U.S. Geological Survey.

10 P. Isola, Zhu, J.Y., Z. Tinghui and A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in Computer Vision and Pattern Recognition, Honolulu, 2017.