A random mental walk.

Friday, October 13, 2006

Code Archeology

A few months ago I started on a project to create a program which will score Scantron-type bubble sheets.

Now we already have a program which does just that. The code, written in the FORTRAN 77 computer language, was written in the early 80's to be used on a Univac. I knew the young woman who wrote the code. Over the years I've made a few small changes as we switched from an Univac to an IBM mainframe and had to accommodate more sheets, but the code is pretty much untouched .

Now friends, FORTRAN, especially classical FORTRAN 77 code can be impenetrable because the variable names had to be 6 characters or less. (In programming, variables are like the unknowns in algebra.) If you've only got a few things to keep track of short names like x, y, and z are fine. Nowadays programmers prefer longer, descriptive variable names such as current_value, cronbach_alpha, and kertosis.

I'm a decent programmer. In theory all I needed to do was make some changes and recompile the code for a PC. That was theory. I didn't have a FORTRAN compiler, the code was mind-numbing, and because there were features which needed to be added I decided to write new rather than revise.

I hauled out an 8-year old C compiler and had a go. The usual statistical stuff was obvious: averages, means, and standard deviations. I had fun learning some statistics. (I now know that kertosis, despite its name, is not a dermatological condition, but is a measure of skewness , i.e. how much a bell shaped curve "leans". ) Things were going fine until I hit Cronbach's Alpha:

In the course of investigating I got to read Cronbach's reflections on the 5oth aniversary of proposing the calculation. (Being a true academic he acknowledged that his "alpha coefficient" was a generalization of a Kuder-Richardson formula as he recounted the intellectual wrangling over its significance and misuse.) Based on various references I thought I had coded the calculations correctly, but my calculations of the alpha codefficient didn't agree with the calculations of the current FORTRAN program.

If you look on the web for Cronbach's Alpha you'll find a ton of references. After some 200 links I didn't find any code. Most citations discussed the significance and how to interpret the results. Save for one link, each link I found for calculating the coefficient described how the calculation could be done by a statistical package (SPSS, SAS, MINITAB, Stata, R, etc.). The one link which seemed useful, http://www.geolog.com/msmnt/malpha.htm, worked through a problem. I based my code on the example.

I finally asked a few instructors for help. Next week I'll sit with a few and go over a sample calculation. Some are going to go deep in their own library to look for an early text which explained how to do the calculation. (It seems that in the early days of computing, extending into the 1980's many people had to code their own statistics or borrow from someone who had already written the code. Experienced coders and statisticians will surely be exasperated that I couldn't go from the description of the calculation to code or that I didn't have confidence in my own coding skills, but I digress.) One Psychology instructor gave me a name of a textbook he knew had a code sample.

Quick like a bunny I trotted over to the library only to find that three empty shelves between H62.N and H62.V. The book I wanted, "Design, Measurement, and Analysis"by Podhazur and Schmelkin (H62.P325) was missing from the library. I suspect a vast right wing consipracy or alien abduction, but again I digress.

As fate would have it on a near by shelf there was a book entitled "Fortran Programming for the Behavioral Sciences" by Veldman (H52.V4) published in 1967. Sure enough the book had code for the "alpha coefficient" and - what the bleep! - a flow chart and a sample code named "TESTAT". Neuron's fired! TESTAT is the name of a piece of code inside the current FORTRAN scoring program. And , well look at that, the code in Veldman's book was the same as the FORTRAN code in our scoring program.

Turning to the date due slip in the back of the book , I saw that the book had been taken out (and renewed) in 1983 and 1984. Code archeology indeed. Does it help? Stay tuned.

Blog Archive