Event Data Sets
Most of the recent data sets use the standard .zip file compression format. To access the older sets with the .sit suffix, download the StuffIt Expander software.Open Event Data Alliance, using a white-list of about 200 web sources and the open-source EL:DIABLO/PETRARCH coding system. All components in the data coding pipeline are transparent and openly available. Data are available in an experimental form (a few holes in the coverage) for 20 June 2014 to 31 December 2014, and in a stable beta of 1 January 2015 to the present.
- Link to the Phoenix data set
- Link to the EL:DIABLO/PETRARCH system
- John Beiler's PyPhox toolset: Python tools for working with Phoenix
- Andrew Halterman's phoxy toolset: R tools for working with Phoenix
CAMEO -- Conflict and Mediation Event Observations -- is the new coding scheme we have developed in conjunction with our current research on third-party mediation. CAMEO has several new features not found in the WEIS system we have used in our earlier work:
- The coding scheme is optimized for the study of mediation and contains a number of tertiary sub-categories specific to mediation
- We have substantially expanded the categories for "use of force" and can therefore make much finer distinctions between reported levels of violence
- We have combined a number of WEIS categories that, in our experience, cannot be reliably differentiated in machine coding.
- An extensive hierarchical system for specifying substate actors.
Link to various copies of the CAMEO codebook
Download CAMEO .verbs dictionary for TABARI (link will display the text of the file, which can be saved in your browser)
Download CAMEO Levant .actors dictionary for TABARI (includes nouns, adjectives and international actors)
Download TABARI .options file with CAMEO event labels
Last update: 18 July 2008
This data set is a compilation more than thirty years of WEIS (to 2005) and CAMEO coded data specifically targeting events relating to states within the Levant. The raw data (1979-2011), a tab-delimited file of dyadic Goldstein-scaled totals (1979-2004), and coding dictionaries are available from this link.
This data set contains about 14,000 records based on reports in six international news sources of killings of five or more non-combatants anywhere in the world from January 1995 to January 2018. The data and the context of their collection are described in detail on the linked page and the codebook. Events are geocoded to approximately the city level when a location can be determined.
Older Data Sets
Most of the codes that are used in the data sets produced by the KEDS project prior to around 2002 are the standard WEIS codes originally developed by Charles McClelland (see "World Event/Interaction Survey (WEIS) Project, 1966-1978", ICPSR Study No. 5211) However, at various points we have experimented with introducing new codes into WEIS, borrowing most of these from the PANDA project. We assigned weights to the new codes that are comparable to the weights used in the Goldstein scale, and those weights are used in the aggregated data.
This data set covers Turkey for the period 3 January 1992 to 31 July 2006 using the CAMEO coding scheme. It is based on Agence France Presse reports
This data set covers the states of the Gulf region and the Arabian peninsula for the period 15 April 1979 to 31 March 1999. The source texts prior to 10 June 97 were located using a NEXIS search command specifically designed to return relevant data.
These files contain WEIS-coded event data for an assortment of Central Asian states, including Afghanistan, Armenia, Azerbijan, Kazakstan, Kyrgistan, Tajikistan, Uzbekistan and Turkmenistan for the period May 1989 to July 1999. In addition to the lead-sentence coding, the "ALL" files include data retrieved from complete-story coding.
This data set contains WEIS-coded events for the major actors (including ethnic groups) involved in the conflicts in the former Yugoslavia from April 1989 through July 2003.
This data set contains WEIS-coded events for the major actors in West Africa from January 1989 through February 2002. The data was produced from full-story coding of Reuters articles. Most of the major opposition groups in the Liberian and Sierra leone civil wars are included in the data.
This data set contains tab-delimited files for third-party "mediation episodes" in the Levant (April 1979 - December 1998) and Balkans (June 1991 - May 1999). A mediation episode is defined as a specific mediator (e.g. USA or UN) meeting with both parties to a conflict within a period of a week; these are aggregated by month. These data were used in the paper Analyzing the Dynamics of International Mediation Processes in the Middle East and the former Yugoslavia (Deborah J. Gerner and Philip A. Schrodt).
These data sets were generated for the purpose of investigating interactions in regional conflicts. The project has traced events in several regional conflicts (each listed separately). The files are organized such that all files ending in .events contain event data as coded by KEDS. Files ending in .actors are the actor lists for each region. Files ending in .verbs are the verb patterns which code to a WEIS category. Files ending in .options and .class are KEDS preference files (described in the KEDS manual). Each is coded from Reuters lead sentences with date ranges and number of event given below.
This site contains older versions of various data sets and software. We are no longer working with these but they might have some utility in replication studies.