(My language is "better" than your language!?)
The "Phonebook" benchmark for comparing programming languages
Note: This website is outdated. It was set up for a study performed in 1999.
The study would be worth repeating (for newer languages such as Ruby, Groovy,
Scala etc. as well as for improved implementations of the languages).
If you are interested in making a repetition happen, contact me (Lutz Prechelt,
prechelt@inf.fu-berlin.de).
The critical aspect is making enough propaganda for collecting a sufficiently
large number of implementations of the task, but making it in such a way that
the average skill of the programmers is the same for each language (rather
than collecting implementations from a lot of wizards and freaks in language
A and a lot of beginners in language B, which would invalidate any findings).
Please do NOT start such advertising right away.
We need to properly update this website and perform some planning first.
In the context of a controlled experiment on a different issue, I have
recently obtained several dozen different implementations of the
same program written in Java, C, or C++.
A comparison of these programs found quite interesting results (see
"Comparing Java vs. C/C++ efficiency differences to inter-personal
differences",
Communications of the ACM 42(10):109-112, October 1999):
Although the differences in memory consumption and runtime
between Java and C/C++ were quite large, the differences between the
individual implementations within each language were even larger.
I believe it would be tremendously interesting to see corresponding
results for many more languages, in particular scripting languages,
because almost all benchmarks I have seen so far rely on but a single
implementation (per language) of each program.
Hence, the purpose of this website is collecting many implementations of
this same program in scripting languages for comparing these languages
with each other and with the ones mentioned above.
The languages in question are
The properties of interest for the comparison are
- programming effort
- program length
- program readability/modularization/maintainability
- elegance of the solution
- memory consumption
- run time consumption
- correctness/robustness
Interested?
If you are interested in participating in this study, please
create your own implementation of the Phonecode
program (as described below) and send it to me by email.
I will collect programs until December 18, 1999. After that
date, I will evaluate all programs and send you the results.
The effort involved in implementing phonecode
depends on how many mistakes you make underways.
In the previous experiment, very good programmers typically finished
in about 3 to 4 hours, average
ones typically take about 6 to 12 hours. If anything went badly wrong,
it took much longer, of course; the original experiment saw times
over 20 hours for about 10 percent of the participants.
On the other hand, the problem should be much easier to do in a scripting
language compared to Java/C/C++, so you can expect much less effort
than indicated above.
Still interested?
Great! The procedure is as follows:
- Read the task description for
the "phonecode" benchmark. This describes what the program should
do.
- Download
- the small test dictionary test.w,
- the small test input file test.t,
- the corresponding correct results test.out,
- the real dictionary woerter2,
- a 1000-input file z1000.t,
- the corresponding correct results z1000.out,
- or all of the above together in a single
zip file.
- Fetch this program header, fill it in,
convert it to the appropriate comment syntax for your language,
and use it as the basis of your program file.
- Implement the program, using only a single file.
(Make sure you measure the time you take separately for design,
coding and testing/debugging.)
Once running, test it using test.w, test.t, test.out only,
until it works for this data. Then and only then start testing it
using woerter2, z1000.t, z1000.out.
This restriction is necessary
because a similar ordering was imposed on the subjects of the original
experiment as well -- however, it is not helpful to use the large data
earlier, anyway.
- A note on testing:
- Make sure your program works correctly. When fed with woerter2
and z1000.t it must produce the contents of
z1000.out (except for the ordering of the outputs).
To compare your actual output to z1000.out, sort both and
compare line by line (using diff, for example).
- If you find any differences, but are convinced that your
program is correct and z1000.out is
wrong with respect to the task description, then re-read the task description
very carefully. Many people misunderstand one particular point.
(I absolutely guarantee that z1000.out is appropriate for the
given requirements.)
If (and only if!) you stil don't find your problem after re-reading
the requirements very carefully, then read this
hint.
- Submit your program by email to
prechelt@ira.uka.de, using
Subject: phonecode submission and preferably inserting your
program as plain text (but watch out so that your email software does
not insert additional line breaks!)
- Thank you!
Constraints
- Please make sure your program runs on
Perl 5.003,
Python 1.5.2,
Tcl 8.0.2,
or Rexx as of Regina 0.08g,
respectively.
It will be executed on a Solaris platform (SunOS 5.7),
running on a Sun Ultra-II, but should be platform-independent.
- Please use only a single source program file, not several files,
and give that file the name phonecode.xx (where xx is
whatever suffix is common for your programming language).
- Please do not over-optimize your program. Deliver your first
reasonable solution.
- Please be honest with the work time that you report; there is no
point in cheating.
- Please design and implement the solution alone.
If you cooperate with somebody else, the comparison will be
distorted.
Note that this web site will close down on December 18, 1999.
Lutz Prechelt,
prechelt@ira.uka.de,
Last modified: Thu Nov 18 12:54:06 MET 1999