Utility Programs

This is a collection of peripheral programs that are designed to help process events data that are produced by KEDS and TABARI. These programs help aggregate, filter, and display the data that are produced after KEDS or TABARI has coded source text. Not all of the utilities apply to both KEDS and TABARI, see specific descriptions for more details.

"events" package for R

Description: "Stores, manipulates, aggregates and otherwise messes with event data from KEDS/TABARI or any other extraction tool with similar output." Maintainer is Will Lowe (will.lowe@uni-mannheim.de). Available at the CRAN R repository. For most people, this will be far more useful than KEDS-Count.

High-Volume Processing Suite

These perl programs are the product of a project we were involved with in summer, 2009 that involved, among other challenges, dealing with a 17.5-Gb file containing about 7.6-million TABARI-formated sentences. The file was not chronologically sorted and contained about 50% duplicates.

This suite of programs was used to reduce this to a set of smaller, chronologically sorted files that removed most of the duplicates. These were then processed by running in parallel on multiple nodes of a small cluster computer running Red Hat Linux -- the 13-node machine coded 26-million sentences in about six minutes -- and the coded event files combined.

As with many of our utility programs, these are working programs that were used to solve a specific problem, rather than general purpose utilities, and they will almost certainly need some modification in order to be used on subsequent projects. Nonetheless, they worked, and might be easier than starting anew. The programs are internally documented and the overall system, including the parallel processing, is described in the Read.Me.txt file.

Download TABARI High-Volume Suite (.zip)

Last update: 25 June 2009

CodeIndex

This program produces an index of the codes from a TABARI .actors or .verbs dictionary. The output is effectively the reverse of the TABARI dictionaries: these associate phrases with codes, whereas this associates codes with phrases. The output is formatted hierarchically using CAMEO actor and event coding conventions. The output can be either a text file with indentation done using tabs, or an HTML table. File contains ANSI C source code, documentation, a compiled version for Mac OS-X, and sample input and output files.

Download CodeIndex (.zip)

Last update: 21 November 2005

scrubkeds

This program is a perl script to clean up the KEDS data: it is basically a set of regular expressions that to fix up some of the character and other formatting issues that one occasionally encounters in less-than-perfect event data that needs to be processed by more-picky programs. Notably R. Thanks to Justin Appleby and Patrick Brandt for this utility.

scrubkeds code (perl)

scrubkeds log example (text file)

Last update: 2 November 2005

One-A-Day_Filter

Program for filtering event data using the rule that each dyad can have one only event per coding category per day ("daily unique dyad-code rule"). This algorithm eliminates all events coded from duplicate stories at the expense of a few false positives.

Source code for the program is ANSI C. Source code and a compiled version for the Macintosh is included; the code will compile under gcc and therefore can be used on any Linux or Unix system. The code is open-source under the GPL license.

Download One-A-Day_Filter Macintosh OS-X version (.zip)

Download One-A-Day_Filter Macintosh OS-9 version (.sit)

Last update: 15 June 2005

Event_Filter

Program for filtering event data based on frequency of events in an interval of time surrounding the event. The program holds the source, target and code frequencies for a set number of day (FILTER_WINDOW) of event data. After a new day of data has been read, the program goes back and evaluates data in days up to and including the day. Event records are retained if their relative frequency in the window exceeds the probability thresholds SOURCE_THRESHOLD (source only), TARGET_THRESHOLD (target given source), and EVENT_THRESHOLD (event given source and target); others are discarded; The three thresholds can be set in the source code; The value of 5% (0.05) currently set for all three thresholds seems to give reasonable results for the Levant and Balkans.

Source code for the program is ANSI C. Source code and a compiled version for the Macintosh is included. Program is open-source under the GPL license.

Download Event_Filter (.sit)

Last update: 28 June 2002

KEDS_Count

This folder contains a program that can be used to aggregate event data for use in a spreadsheet or statistical program. It will work with any event data which has tab-delimited field, not just KEDS data. The documentation for the program is in the file KEDS_Count.doc in MS-Word 5.1a format. The program can count individual event types as well as produce interval-level scaled totals, and will do daily, weekly, biweekly monthly and quarterly aggregations. Command files are included for converting WEIS events to the Goldstein (1992) scale and CAMEO version 0.9B5 events to a scale comparable to Goldsteins.

Version 3.0b1 is written in Java and should run on Macintosh, Java and Windows systems. It eliminates some bugs from earlier versions, and provides for the aggregation of multiple actors (for example PAL* and PSE*), an option of individual files for each dyad or a single file containing all dyads, and labelling of dyads.

This has been pretty much superceded by Will Lowe's "events" package in R.

Download KEDS_Count 3.0b3 (.zip)
Last update: 15 April 2008

Older versions

Pascal (.sit) Note: There is a bug involving leap years in the daily aggregation routine; thanks to Patrick Brandt for detecting this.
Last update: 8 March 1998

C++ (.zip)
Last update: 14 April 2003

Aggregator v01.exe

This program takes the output of event data machine coding programs (such as KEDS or TABARI) and creates a table of the scaled events aggregated in the requested time period. The table is saved to a file. One of the more powerful features of the program is that it allows data to be simultaneously aggregated across a range of aggregation periods (minimum number of days to maximum number of days). The program also allows users to select a custom aggregation period in days. Other options include aggregating by 1 day, 2 days, 1 week, 2 weeks, 1 month, quarterly, semi-annually, and yearly. A sample output file is included.

Download Aggregator v01.exe

Program Requirements:

Operating System: MS Windows
All files must be comma delimited text files.

The program was written by G. Dale Thomas at the Department of Government, University of West Florida.
Email: Thomas628@aol.com

Last update: 4 February 2002

Factiva.postfilter.pl

This Perl program is used for post-filtering of stories downloaded via email from the Factiva news site. It is used to remove stories based on sets of words and phrases found in either the headline and lead, or the body of the article. It is useful for doing a secondary filtering on files that have already been downloaded. The output is a slightly reformatted version of the original file.

Download Factiva.postfilter.pl (.zip)

Last update: 24 March 2010

KEDS Display

This Macintosh program displays the KEDS 1982-1993 Middle East data set in a variety of formats. It is designed to provide an introductory demonstration of event data, for example in a classroom or a briefing. The program has extensive help screens and -- in theory -- is self-explanatory. This program only works in the Macintosh "Classic" operating system, not in OS-X.

Sample screens from the program

Download KEDS_Display