Benchmarking of learning algorithms
information repository page
Abstract: Proper benchmarking of (neural network and other) learning
architectures is a prerequisite for orderly progress in this field. In
many published papers deficiencies can be observed in the benchmarking
that is performed.
A workshop about NN benchmarking at NIPS*95 addressed the status quo of
benchmarking, common errors and how to avoid them, currently existing
benchmark collections, and, most prominently, a new benchmarking
facility including a results database.
This page contains pointers to written versions or slides of most of the
talks given at the workshop plus some related material.
The page is intended to be a repository for such information to be used
as a reference by researchers in the field. Note that most links lead
to Postscript documents.
Please send any additions or corrections you
might have to Lutz Prechelt (email@example.com).
- 1996-01-05: added David Rosen's
Data Sources page in 'Other sources of data' section.
- 1995-12-18: added
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/data/ in 'Other sources
of data' section.
- 1995-12-12: added paper by Zhu/Rohwer in 'Related Information' section.
Assessment of the status quo:
- Previously available NN benchmarking data
Advantages of these: UCI is large and growing and popular,
Statlog has largest and most orderly collection of results available
(in a book, though),
and Proben1 is most easy to use and best supports reproducible experiments.
Elena and nnbench have no particular advantages.
Disadvantages: UCI and Probem1 have too few and too unstructured results
available, Proben1 is also inflexible and small, Statlog is partially
confidential and neither data nor results collection are growing.
- Carl Rasmussen and Geoffrey Hinton.
A thoroughly designed benchmark collection
A proposal of data, terminology, and procedures and a facility for the
collection of benchmarking results.
This is the newly proposed standard for benchmarking NN (and other)
learning algorithms. DELVE is currently available as an Alpha release
developed at the University of Toronto.
Other sources of data:
- See the appropriate section in Part 4 of the
- (Thanks to Nici Schraudolph <firstname.lastname@example.org>)
There is a large amount of game data about the board game Go available on
the net. Good starting points are
the Go game database project and
the Go game server.
The database holds several hundred thousand games of Go and could for instance
be used for advanced reinforcement learning projects.
- (Thanks to Matthias Blume <email@example.com>)
Tony Robinson has made some speech data available via ftp:
In particular, the Peterson-Barney vowel data (file PetersonBarney.tar.Z)
seems useful for NN benchmarking. The data set is well documented in journals
and has been fairly widely used.
- (Thanks to David Rosen <firstname.lastname@example.org>)
David Rosen maintains a Data Sources page, which is a part of the
WWW Virtual Library: Statistics.
- (Thanks to Partha Mitra <email@example.com>)
Los Alamos Systems
- (Thanks to Wouter.Favoreel@esat.kuleuven.ac.be)
A platform for the interchange of information and data in the field of
Other related information:
- Huaiyu Zhu and Richard Rohwer.
Bayesian regression filters and the issue of priors.
Test of priors versus test of learning algorithms;
is there a program which will do best on average
on the set of benchmarking data sets freely retrievable over the
Internet (the Internet Game)?
Is there a theoretically optimal algorithm for an arbitrary given prior?
A prior for the Internet Game, regardless of the problem domain.
Can the motivation of benchmarking be exploited in designing an algorithm?
This page has received a
award in the Neural Networks topic.
Please send additions and corrections
to Lutz Prechelt
Last modified: Wed Mar 29 18:09:30 MET DST 2000