Why Learn Perl Programming?

(Philip Schrodt, from Political Science 907, November 2000)

[See August 2012 addendum at bottom of page]

Over the past couple of weeks, I've been experimenting with the computer language Perl ("Practical Extraction and Report Language"), which is designed specifically for text processing. Perl has only been widely available for about five years, but in that time has become very popular. Because it is optimized for text processing, programs that do useful things can be very short. For example, the following program counts the number of "22" WEIS codes in an event data file named CASIA.EVT:

$ka=0;
open (INFILE, "CASIA.EVT");
while (<INFILE>) { if (m/\t22/) {++$ka;}}
print "Total events = $ka \n";

With a few more lines, this can be extended to count all 2-digit WEIS categories

$evtnum= 1;
while ($evtnum < 23) {
  open (INFILE, "CASIA.ALL.EVT.3CODE");
  $ka=0;
  $event = sprintf("%02d",$evtnum);
  while (<INFILE>) { if (m/\t$event/) {++$ka;}}
  print "$event count = $ka \n";
  close INFILE;
  ++$evtnum;
}

In a more complicated application, I wrote a text filter for NEXIS in about 150 lines of Perl; comparable filters in C and Pascal required about 1000 lines of code. Program modifications involving recognition of new patterns usually require only an additional line or two of code.

Advantages:

[All this assumes one already knows C, which was covered in POLS 907.]

  1. Most of the control structures and syntax of Perl are the same as in C.
  2. Perl does not require any of the headers and variable declarations used in C.
  3. Perl contains a large number of additional string-oriented functions and data structures not available in C.
  4. The pattern matching and substitution options are incredibly rich.
  5. Perl transparently interfaces with the operating system (at least in Unix) -- in other words, a Perl program can easily move, delete or rename files, fetch web pages, and the like. [The Macintosh OS X operating system is just Unix with very fancy graphical interface, so Perl will also work for this.]
  6. Perl is open-source and freely available for Unix, Linux, Windows and Macintosh. It runs as part of the operating system on the KU Compaq/Digital Unix machines, in Linux, and in the new Macintosh OS X operating system.
  7. There is extensive documentation and source code available on the Web.

Caveats:

Perl comes out of the Unix community and a lot of the most powerful features of the language are based on Unix models, which will seem obscure until you become familiar with them. The advantage is that once you've learned the "regular expression" syntax for Perl, you can also use it in Unix.

Disadvantages:

  1. Perl is an interpreted language, rather than a compiled language, so it is probably too slow for writing large programs. The speed seems fine on both Unix and the Mac, however -- the simple program above runs through a 30,000 line data file in less than a second on a Mac G3.
  2. This is a text-processing language, not a general purpose language.

Running Perl on Unix machines

  1. Enter a Perl program using pico or some other text editor
  2. Enter the command "perl " (e.g. "perl kount.pl")
  3. If the program is out of control, the usual Unix "Control-C" will kill it.
  4. Use "man perl" to get an introduction to the on-line Perl manual; however, the http://www.perldoc.com/ gives the same information in a Web-based format.

Running Perl on the Macintosh

Download "MacPerl" from www.macperl.com (it's free -- this is the open-source world) and install it. It contains an editor, or alternatively you can use BBEdit for the editor -- it gives nice syntax coloring -- and then switch to MacPerl to run the program.

Update for OS-X: perl is built into the system and doesn't need to be downloaded. Just run it in the Terminal application following the Unix instructions above.

Running Perl on Windows

Can't be too hard, but as you might expect given my attitude towards Windows, I haven't found a need for this. But it is available -- check the www.perl.com web site.

For further information:

Larry Wall, Tom Christiansen, and Jan Orwant. 2000. Programming Perl. (3rd edition) Cambridge: O'Reilly Associates.
[this is known as the "camel book" and is the definitive guide to Perl. 1067 pages. Possibly more than you want to know.]

Randal Schwartz and Tom Christiansen. 1997. Learning Perl. (2rd edition) Cambridge: O'Reilly Associates.
[covers the 30% of the language that is used most of the time]

http://www.perl.com/
[home page for the Perl enterprise]

http://www.perldoc.com/
[this links into full Perl documentation, complete with a search facility]

"An Instantaneous Introduction to Perl"
[by Michael Grobe, University of Kansas]

Addedum: So why did you switch to Python?? [August 2012]

Python first appeared on my radar in 2006 at one of those political text analysis workshops that was about 50/50 computer scientists and political scientists, and a couple of the computer scientists noted that they had switched to Python because students picked it up more easily. In subsequent years, I began seeing more frequent references to it, as well as many people noting that it was better for large programs, and increasingly, students I was encountering had learned Python rather than perl.

Not exactly quick on the uptake, by spring 2011 I nonetheless decided to give it a try, and picked a fairly large project -- the PoliNER/CodeCatcher system -- to experiment with. And, indeed, that experience confirmed all of those general sentiments about Python, and then some. Python [conveniently] "thinks like I do" (to use someone's expression...), and in particular if I think of Python as mostly manipulating lists (cf: TABARI), it runs really, really fast even for a scripted language.

So as of Spring 2012, I'm trying to write all of the new programs for the project in Python, and this is also the language I'm suggesting that students learn for general text processing and filtering.

So why is this at the bottom of the page? Almost all of the arguments I've made above about perl apply with minor caveats to Python. Besides, I warned you at the top of the page.