The Computational Event Data Project

The Computational Event Data Project is a twenty-five-year project focused on the development and application of political event data; it has been variously funded by the National Science Foundation, Penn State University, and the University of Kansas. The project has three major research concentrations:

Software development for the machine-coding of political event data

The initial focus of the project was the development of techniques for converting English-language reports of political events into event data. This replaced labor-intensive and error-prone process of human coding with inexpensive, transparent and reproducible automated coding. This effort produced the KEDS and later TABARI computer programs, as well as a number of utility programs for filtering texts and aggregating the resulting event data so that it can be used in statistical analysis.

Production of event data sets

Using the automated coding program, news reports from the Reuters and Agence France Presse wire services, and the World Events Interaction Survey (WEIS) event coding scheme, we produced several event data sets that can be downloaded for use in political studies. Our primary focus has been on the Levant —Egypt, Israel, Jordan, Lebanon, the Palestinians and Syria—though we also produced short-term data sets for other regions, and we are currently providing support for the near-real time GDELT data set. As part of an NSF-sponsored project which used event data to study mediation, we developed the CAMEO event and actor ontology to replace WEIS in our work.

Development of early warning methods

Most of our research has focused on the development of early warning techniques for political change, primarily using the Levant as a case study. We have experimented with a number of different methods, including factor analysis, discriminant analysis, an assortment of clustering algorithms, and most recently hidden Markov models.

This research was funded primarily by the U.S. National Science Foundation including grants SES-9410023, SES-0096086, SES-0455158, SES-0527564, SES-0921027 and SES-1004414. Additional support has been provided by the University of Kansas General Research Allocation Fund and a Fulbright-Hays Faculty Research Abroad Fellowship.