Small Corps, Big Data

In 1941, the United States was made tragically aware of its lack of understanding of enemy capabilities and intentions. In contrast, our enemies knew a great deal, including the seemingly innocuous fact that U.S. sailors got shore leave on weekends. This information, with basic analysis, led to the logical conclusion that ships would be particularly vulnerable on a Sunday morning.1 Post-Pearl Harbor, few did not understand the need for as much information on strengths, weaknesses, and intentions as the intelligence community could gather.

During World War II, the predecessor of the Central Intelligence Agency, the Office of Strategic Services (OSS), experienced its first successes through research and analysis.2 Realizing that a great deal of information can be obtained from open sources, head of the OSS, COL William Donovan and the Librarian of Congress, Archibald McLeish, agreed that the information desired already existed. Everything the country needed was available in libraries. “What was required, therefore, was an army of experts who best knew how to handle ‘the most powerful weapon in the OSS arsenal: the three-by-five index card.’”3

The amount of openly available information has done nothing but increase over the past 75 years. By the end of 2015, we can expect the equivalent data of 18 million Libraries of Congress to be available on the Internet.4 We’ve heard current times called the Information Age, Computer Age, and Digital Revolution. The staggering statistics justify such names. Every minute, 300 hours of video are uploaded just to YouTube.5 370 million photos are uploaded daily between Facebook6 and Instagram.7 Global Internet usage reached over 3 billion people in 2014 with growth at almost 6,500 percent in Africa and 3,300 percent in the Middle East,8 areas of constant USMC attention. The statistics are as boundless as they are staggering, but the message is clear: there is a truly massive amount of data being created and posted to the Internet. The question becomes, then, what is the new “three-by-five card” and who is trained to handle it?7

Anecdotes and success stories utilizing open source and social media are abundant from the national-level intelligence community and law enforcement. The utility of this skill is vast and well-documented, but it is not limited to just the strategic interests of the national intelligence community members. Tactical applications in support of a MEU or any other MAGTF are expansive, ranging from targeting to operational security and tactical intelligence preparation of the battlespace to humanitarian assistance/disaster relief. However, without the requisite expertise available to the MAGTF, we will never be able to fully capitalize on the abundance of available intelligence information.

Two central issues define our current lack of capability. First, we require Marines who possess the skills to collect, compile, and evaluate publically available information. Second, we need to foster extensive interaction with intelligence community partners, academic institutions, nongovernmental organizations, and industry leaders who capitalize on open source, crowdsourcing,9 and the potential network of over 3 billion individuals.

Addressing the primary issue, the Marine Corps needs individuals with the expertise to locate and extract information available via open source. Expeditionary Force 21 declares intelligence as “an indispensable Marine Corps Warfighting Function” and calls for “… integrated synchronized management and employment of the ISR Enterprise, to include all intelligence disciplines.10 MOSs exist to enable intelligence activity in signals intelligence (SIGINT), human intelligence (HUMINT) and geospatial intelligence (GEOINT). The presence of open source intelligence in that list is conspicuously absent. Formally trained open source Marines do not exist.

Oftentimes, our collections requirements are fielded by a Marine who has the knowledge and access to operate software or query databases specific to their MOS. For open source, we can consider social media, and more generally the Internet, as the single largest “database” in existence. Yet we have no one who is an expert to query it. Furthermore, our unclassified systems, already limited in bandwidth, block a large portion of its content due to cyberspace defense concerns. We send our random smattering of Marines to any of the dozens of weeklong courses that address open source. An “expert” Marine may even have several weeks of training. However, we have no unified standard, and we would never expect a collector in any other intelligence field to be functional with two weeks of training followed by minimal application. As a result, time and again we squander small sums of money, and billions of gigabytes of data are left virtually untouched.

There is an abundance of government off-the-shelf (GOTS) and commercial tools currently in use or development to assist with accessing open source, specifically social media.11 The wide variety of software packages and lack of a unified solution is an issue, not unlike our absence of a training standard. These tools are absolutely essential to the analyst and language software is also required. However, no tool can ever be comprehensive and in isolation they are all insufficient. The Internet is massive and ever-changing. We cannot rely exclusively on the static social media sites or currently documented news outlets. Open source intelligence (OSINT) analysts will need to access blogs, local news, and forum discussions. We will expect these Marines to be able to pivot between disparate problem sets, one month in support of operations related to U.S. Africa Command and the next to U.S. Central Command. Every signal environment is not the same, and so we teach our SIGINT Marines the theory to map, understand, and operate optimally in all environments. The same must be true for our OSINT Marines working in the public domain. We cannot expect enduring success when we train to a tool and not a fundamental knowledge.

What we need are Marines with formal, extensive OSINT training and a corresponding MOS. As mentioned earlier, they must be trained in the data mining software suites and dashboards available to them; however, they also need to be trained to independently research and identify web sites and social media pertinent to their assigned area of operations. Marines must be able implement open source collection techniques with the ability to conduct queries using sophisticated search methodologies, web scraping, and accessing application programming interfaces. They must understand how their presence is detected during their collection efforts, what footprints are left behind, and how this detection is mitigated.

After collecting data, our Marines need to be trained in basic data analytics, such as predictive analysis, pattern recognition, social network analysis, and statistics. Managing large quantities of data is a challenge not exclusive to OSINT. This is a deficiency acknowledged across Marine Corps intelligence disciplines and the solution created for one should service the full scope of the Marine Corps Intelligence, Surveillance, and Reconnaissance Enterprise (MCISRE).

As with any intelligence MOS, analysts and their leadership need to understand the ethics, legal policies, and intelligence oversight specific—and sometimes unique—to their discipline. Traditional intelligence analysis skills such as critical thinking and report writing are equally essential. Finally, after weeks to months of training, our OSINT Marines must apply their skills, day in and day out, fully integrated in MCISRE.

These trained Marines can naturally be infused with MAGTF intelligence centers (MICs) located at each garrison MEF. Detailed OSINT demands high bandwidth access to the Internet, and sensitive open source collection will, at times, require managing attribution. While packaging these capabilities into a deployable module is not impossible, the support can more practically be leveraged from a static architecture within the MICs. OSINT Marines will have the capability to support the full range of military operations by providing direct reach back support to forward deployed MAGTFs while also supporting pertinent strategic/operational intelligence to commanding generals and their staffs.

Most importantly, from within the MIC, intelligence gathered by open source analysts can immediately be used to queue GEOINT, HUMINT, and SIGINT, providing a comprehensive all-source product to commanders. Oftentimes, discussions of OSINT focus on cases where, in isolation, open source provided near real time data to a commander. This is an unequivocally important function. However, true open source MCISRE integration offers even greater utility.

The ability to feed partner intelligence disciplines validates the human capital required to resource a new skill. Adhering to a zero sum game for fleet manning, the creation of an open source cell of 10 to 15 Marines in each MIC requires other intelligence Marines to be trained and reallocated. Paradoxically, this reallocation of finite resources will increase capacity for the very MOSs from which it will pull personnel, as well as provide currently nonexistent, uniquely OSINT. When an open source analyst, in the conduct of their work, encounters a potential SIGINT target, HUMINT source, or GEOINT tipper, they can directly feed their intelligence counterparts. MCISRE, as a whole, flourishes. Analogously, a SIGINT Marine does not interact with a local population to ask what radio frequencies adversaries use. HUMINT Marines, educated in how they may support other intelligence functions in the conduct of their profession, obtain and laterally provide this information. Not only does the existence of HUMINT provide its own unique and essential skills, but the other intelligence disciplines directly prosper from its existence. In many environments, OSINT may very well be the most consistent and greatest tool we have to assist or even establish our HUMINT, GEOINT, and SIGINT footholds. This is the mentally we must adopt.

The amount of useful information publically available has been well documented. By accepting the massive complexity inherent to open source and acknowledging we cannot expect HUMINT, SIGINT, and GEOINT Marines to be masters of both their disciplines and OSINT, it is apparent that all of the MCISRE suffers in the absence of open source experts.

Apart from these OSINT experts, the second issue is a need for increased participation within the OSINT community of practice. The CIA is the intelligence community’s OSINT commodity manager, responsible for the Director of National Intelligence (DNI) Open Source Center (OSC).12 The Defense Intelligence Agency acts as commodity manager for Department of Defense OSINT activities through the Defense Open Source Council (DOSC).13 It is essential that the Marine Corps, while applying OSINT to the tactical edge, employs the strategic best practices of the OSC and operates in compliance with guidelines put forth by the DOSC. Through coordination via the Marine Corps Intelligence Activity, MCISRE leadership can ensure improvement of standards and adherence to policy.

Lastly, by the very nature of open source information, it is available to everyone. The not necessarily obvious conclusion from this obvious statement is that oftentimes Marine Corps interests align with not only the intelligence community, but also with those of nongovernmental organizations and academic institutions. This is particularly applicable when it comes to humanitarian assistance/disaster relief. Crisis mapping is the near real time gathering and analysis of data relating to natural disaster or conflict. Universities will assist in the collating and data processing for crisis mapping, taking advantage of crowd sourcing. By developing relationships with academic institutions engaged in these practices, the Marine Corps can leverage countless man hours of work in support of MAGTF operations.

The 2010 Haiti earthquake is cited with claiming 316,000 lives.14 San Diego State University’s Immersive Visualization Center, also known as the Viz Lab, acted as a primary source for processing imaging data for the U.S. Navy.15 Aerial images collected by U.S. Navy P-3 aircraft and satellite imagery were sent to the Viz Lab. This was then layered with imagery taken by civilians using handheld GPS-enabled devices. The corresponding maps helped relief organizations locate refugee camps, avoid damaged infrastructure, and inform the U.S. Navy on critical areas that required rebuilding.

This type of civilian effort is constantly growing as improved technology leads to increased crowdsourcing. UAViators is a group of volunteer civilian unmanned aircraft systems (UAS) pilots who provide support to humanitarian efforts. With their partnerships, they advertise over 900 vetted UAS pilots in over 60 countries.16 Marine Corps involvement with the OSINT community through academic partnerships drastically increases the amount of data which can be accumulated and processed through external resources to accomplish common goals.

Ultimately, by properly training our Marines to the standards presented above, we solve the new challenges of an old problem, namely open source intelligence. The Marine Corps is already behind the times with not only its lack of organic capability, but also its lack of integration with the government, nongovernment, and academic OSINT communities. Creating well-resourced OSINT cells within the MAGTF intelligence centers is a strong step in simultaneously remedying these deficiencies and reinforcing all intelligence disciplines.


1. Anthony Olcott, Open Source Intelligence in a Networked World, (New York: Bloombury Academy, March 2012).

2. Robin W. Winks, Cloak & Gown: Scholars in the Secret War, 1939–1961, (New Haven, CT: Yale University Press, March 1996), 63.

3. Olcott.

4. CenturyLink Business, “Finding High Ground as Big Data Swells,” 20 April 2015, accessed at

5. YouTube Press Statistics, 8 April 2015, accessed at

6. Zephoria Internet Marketing Solutions, “Top 15 Valuable Facebook Statistics,” 8 April 2015, accessed at

7. Press News, “Instagram Press Statistics,” 8 April 2015, accessed at

8. Internet World Stats, “World Internet Users and 2014 Population Statistics,” 12 April 2015, accessed at

9. Crowdsourcing is the process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. Wikipedia dictionary at

10. Headquarters Marine Corps, Expeditionary Force 21, (Washington, DC: 4 March 2014), author emphasis.

11. Gregory Freeman and Robert Schroeder. “Social Media Exploitation: An Assessment, Common Operational Research Environment (CORE) Laboratory,” (Washington, DC: Department of Defense Analysis, Naval Postgraduate School, 2014).

12. Central Intelligence Agency, INTelligence: Open Source Intelligence, (Langley, VA: 2010), 18 April 2015, accessed at

13. Department of Defense, Department of Defense Instruction 3115.12, Open Source Intelligence (OSINT), (Washington, DC: 24 August 2010).

14. U.S. Geological Survey. Largest and Deadliest Earthquakes by Year: 1990–2014, (Washington, DC: Department of the Interior, 20 April 2015), accessed at

15. Gina Jacob, Viz Lab Helps on the Ground in Haiti, (San Diego, CA: San Diego State University, 20 April 2015), accessed at

16. UAViators, “Humanitarian UAV Network,” online blog, 21 April 2015, accessed at