A Very Very Short Introduction to Event Data Analysis

The Computational Event Data System is the current name for a series of projects beginning around 1998 that have focused on the machine coding of international event data using pattern recognition and simple grammatical parsing. These systems are designed to work with short news articles such as those found in wire service reports or chronologies. To date, the software has primarily been used to code events from Reuters and Agence France Presse wire service lead sentences but in principle it can be used for other event coding schemes.

The TABARI software will work with any input text containing a series of records in the following format

[date]
[any other information]
[source text line 1]
[source text line 2]
...
[source text line n]
[blank line]

Source lines should be a maximum of 80 characters in length and the last character in each line should be blank. The source text is presently limited to 16 eighty-character lines containing a maximum of 1024 words. All characters are converted to upper- case and punctuation, except for commas, is eliminated.

We have been working with Reuters and Agence France Presse news service leads downloaded from the Factiva and Lexis-Nexis data services. We have used an assortment of simple filter programs to remove all of the irrelevant information, and format the text; the source code for these programs is available elsewhere at this site.

Click here to go to a more detailed introduction on using the coding system

Click here for a book-length treatment of event data generally, and here for a reasonably up-to-date survey of the literature.

Click here to go to a detailed history of the Computational Event Data Project