(My language is "better" than your language!?)
The "Phonebook" benchmark for comparing programming languages

Note: This website is outdated. It was set up for a study performed in 1999. The study would be worth repeating (for newer languages such as Ruby, Groovy, Scala etc. as well as for improved implementations of the languages). If you are interested in making a repetition happen, contact me (Lutz Prechelt, prechelt@inf.fu-berlin.de).
The critical aspect is making enough propaganda for collecting a sufficiently large number of implementations of the task, but making it in such a way that the average skill of the programmers is the same for each language (rather than collecting implementations from a lot of wizards and freaks in language A and a lot of beginners in language B, which would invalidate any findings).
Please do NOT start such advertising right away. We need to properly update this website and perform some planning first.

In the context of a controlled experiment on a different issue, I have recently obtained several dozen different implementations of the same program written in Java, C, or C++. A comparison of these programs found quite interesting results (see "Comparing Java vs. C/C++ efficiency differences to inter-personal differences", Communications of the ACM 42(10):109-112, October 1999): Although the differences in memory consumption and runtime between Java and C/C++ were quite large, the differences between the individual implementations within each language were even larger.

I believe it would be tremendously interesting to see corresponding results for many more languages, in particular scripting languages, because almost all benchmarks I have seen so far rely on but a single implementation (per language) of each program.

Hence, the purpose of this website is collecting many implementations of this same program in scripting languages for comparing these languages with each other and with the ones mentioned above. The languages in question are

Perl
Python
Rexx
Tcl

The properties of interest for the comparison are

programming effort
program length
program readability/modularization/maintainability
elegance of the solution
memory consumption
run time consumption
correctness/robustness

Interested?

If you are interested in participating in this study, please create your own implementation of the Phonecode program (as described below) and send it to me by email.

I will collect programs until December 18, 1999. After that date, I will evaluate all programs and send you the results. The effort involved in implementing phonecode depends on how many mistakes you make underways. In the previous experiment, very good programmers typically finished in about 3 to 4 hours, average ones typically take about 6 to 12 hours. If anything went badly wrong, it took much longer, of course; the original experiment saw times over 20 hours for about 10 percent of the participants. On the other hand, the problem should be much easier to do in a scripting language compared to Java/C/C++, so you can expect much less effort than indicated above.

Still interested?

Great! The procedure is as follows:

Read the task description for the "phonecode" benchmark. This describes what the program should do.
Download
- the small test dictionary test.w,
- the small test input file test.t,
- the corresponding correct results test.out,
- the real dictionary woerter2,
- a 1000-input file z1000.t,
- the corresponding correct results z1000.out,
- or all of the above together in a single zip file.
Fetch this program header, fill it in, convert it to the appropriate comment syntax for your language, and use it as the basis of your program file.
Implement the program, using only a single file.
(Make sure you measure the time you take separately for design, coding and testing/debugging.) Once running, test it using test.w, test.t, test.out only, until it works for this data. Then and only then start testing it using woerter2, z1000.t, z1000.out.
This restriction is necessary because a similar ordering was imposed on the subjects of the original experiment as well -- however, it is not helpful to use the large data earlier, anyway.
A note on testing:
- Make sure your program works correctly. When fed with woerter2 and z1000.t it must produce the contents of z1000.out (except for the ordering of the outputs). To compare your actual output to z1000.out, sort both and compare line by line (using diff, for example).
- If you find any differences, but are convinced that your program is correct and z1000.out is wrong with respect to the task description, then re-read the task description very carefully. Many people misunderstand one particular point.
  (I absolutely guarantee that z1000.out is appropriate for the given requirements.)
  If (and only if!) you stil don't find your problem after re-reading the requirements very carefully, then read this hint.
Submit your program by email to prechelt@ira.uka.de, using
Subject: phonecode submission and preferably inserting your program as plain text (but watch out so that your email software does not insert additional line breaks!)
Thank you!

Constraints

Please make sure your program runs on Perl 5.003, Python 1.5.2, Tcl 8.0.2, or Rexx as of Regina 0.08g, respectively. It will be executed on a Solaris platform (SunOS 5.7), running on a Sun Ultra-II, but should be platform-independent.
Please use only a single source program file, not several files, and give that file the name phonecode.xx (where xx is whatever suffix is common for your programming language).
Please do not over-optimize your program. Deliver your first reasonable solution.
Please be honest with the work time that you report; there is no point in cheating.
Please design and implement the solution alone. If you cooperate with somebody else, the comparison will be distorted.

Note that this web site will close down on December 18, 1999.

Lutz Prechelt, prechelt@ira.uka.de, Last modified: Thu Nov 18 12:54:06 MET 1999

(My language is "better" than your language!?) The "Phonebook" benchmark for comparing programming languages

Interested?

Still interested?

Constraints

(My language is "better" than your language!?)
The "Phonebook" benchmark for comparing programming languages