Computational Event Data System Software

What to expect from this software

Joel Spolsky has written an excellent essay titled Five Worlds differentiating various types of software. Penn State Event Data Project software falls into Spolsky's "internal" category -- it has been developed for a specific project and environment. This is quite distinct from the "shrinkwrap" category, where far more effort has been made to insure that the software works correctly in multiple environments. Most of this software has, in fact, been used at multiple sites, but we haven't spent hundreds of hours -- or millions of dollars -- testing and fine-tuning it for that purpose. Adjust your "ease of use" expectations accordingly.

A more succinct version of this: open source software is "free as in beer" and "free as in speech" but also "free as in `free puppies.'"

See the paragraphs below for a preview of each software category available from this website. You can also click on the pointers to learn more about each particular category.

TABARI

The TABARI (Text Analysis By Augmented Replacement Instructions) program is the most recent event coding program that the Penn State Event Data Project has produced. As the open source C++ successor to the KEDS program, we have added a number of capabilities not present in KEDS that facilitate parsing and grammatical recognition. The current version (0.8) is fully functional for the Linux and Macintosh operating systems; a somewhat older version is available for Windows.

Dictionaries

These are some general dictionaries that may be useful for text analysis projects outside of the TABARI framework, as well as current versions of several of the TABARI dictionaries. The individual data sets also include the dictionaries used to create the data.

Utility Programs

This collection of peripheral programs are designed to help process events data that are produced by KEDS and TABARI. These programs help aggregate, filter, and display the data that are produced after KEDS or TABARI has coded source text. All of the utilities do not apply to both KEDS and TABARI, see specific descriptions for more detail

Text Filters

This collection of filters aid in the retrieval and formatting of internet-based news leads, and helps compile the data into an input file to be read by KEDS and TABARI. The processes involved in this task include downloading the lead sentences from a web-based source, ordering the information chronologically, and formatting the specific sourcecodes and identifiers for interpretation by KEDS or TABARI. The tasks performed by each individual filter are explained on this page.

Older programs

  • [circa 2000]
    C source code which a left-right hidden Markov model and the corresponding Baum-Welch maximum likelihood training algorithm using the algorithms described by Rabiner (1989). As far as I know it works, but you'd be far better off using R

  • [circa 1995]
    KEDS (Kansas Event Data System) program was our first foray into the automation of events data. The KEDS program runs natively on Mac OS 6.0 or later and, unlike TABARI, supports a nifty GUI interface. KEDS uses the same sparse-parsing principles as TABARI and provides for somewhat greater flexibility (features of KEDS that were used only rarely were not included in TABARI), but as the "Classic" environment has been pretty much phased out of the Apple product line, is very difficult to use.