Foreword by Jerome Feldman
Springer-Verlag, Berlin, New-York, 1996 (502 p., 350 illustrations).

Sample Chapter 7: "The Backpropagation Algorithm"
1.1 Neural computation
1.1.1 Natural and artificial neural networks
1.1.2 Models of computation
1.1.3 Elements of a computing model
1.2 Networks of neurons
1.2.1 Structure of the neurons
1.2.2 Transmission of information
1.2.3 Information processing at the neurons and synapses
1.2.4 Storage of information - Learning
1.2.5 The neuron - a self-organizing system
1.3 Artificial neural networks
1.3.1 Networks of primitive functions
1.3.2 Approximation of functions
1.3.3 Caveat
1.4 Historical and bibliographical remarks
2 Threshold logic
2.1 Networks of functions
2.1.1 Feed-forward and recurrent networks
2.1.2 The computing units
2.2 Synthesis of Boolean functions
2.2.1 Conjunction, disjunction, negation
2.2.2 Geometric interpretation
2.2.3 Constructive synthesis
2.3 Equivalent networks
2.3.1 Weighted and unweighted networks
2.3.2 Absolute and relative inhibition
2.3.3 Binary signals and pulse coding
2.4 Recurrent networks
2.4.1 Stored state networks
2.4.2 Finite automata
2.4.3 Finite automata and recurrent networks
2.4.4 A first classification of neural networks
2.5 Harmonic analysis of logical functions
2.5.1 General expression
2.5.2 The Hadamard-Walsh transform
2.5.3 Applications of threshold logic
2.6 Historical and bibliographical remarks
3 Weighted Networks - The Perceptron
3.1 Perceptrons and parallel processing
3.1.1 Perceptrons as weighted threshold elements
3.1.2 Computational limits of the perceptron model
3.2 Implementation of logical functions
3.2.1 Geometric interpretation
3.2.2 The XOR problem
3.3 Linearly separable functions
3.3.1 Linear separability
3.3.2 Duality of input space and weight space
3.3.3 The error function in weight space
3.3.4 General decision curves
3.4 Applications and biological analogy
3.4.1 Edge detection with perceptrons
3.4.2 The structure of the retina
3.4.3 Pyramidal networks and the neocognitron
3.4.4 The silicon retina
3.5 Historical and bibliographical remarks
4 Perceptron learning
4.1 Learning algorithms for neural networks
4.1.1 Classes of learning algorithms
4.1.2 Vector notation
4.1.3 Absolute linear separability
4.1.4 The error surface and the search method
4.2 Algorithmic learning
4.2.1 Geometric visualization
4.2.2 Convergence of the algorithm
4.2.3 Accelerating convergence
4.2.4 The pocket algorithm
4.2.5 Complexity of perceptron learning
4.3 Linear programming
4.3.1 Inner points of polytopes
4.3.2 Linear separability as linear optimization
4.3.3 Karmarkar´s Algorithm
4.4 Historical and bibliographical remarks
5 Unsupervised learning and clustering algorithms
5.1 Competitive learning
5.1.1 Generalization of the perceptron problem
5.1.2 Unsupervised learning through competition
5.2 Convergence analysis
5.2.1 The one-dimensional case - Energy function
5.2.2 Multidimensional case - The classical methods
5.2.3 Unsupervised learning as minimization problem
5.2.4 Stability of the solutions
5.3 Principal component analysis
5.3.1 Unsupervised reinforcement learning
5.3.2 Convergence of the learning algorithm
5.3.3 Multiple principal components
5.4 Examples
5.4.1 Pattern recognition
5.4.2 Image compression
5.5 Historical and bibliographical remarks
6 One and two layered networks
6.1 Structure and geometric visualization
6.1.1 Network architecture
6.1.2 The XOR problem revisited
6.1.3 Geometric visualization
6.2 Counting regions in input and weight space
6.2.1 Weight space regions for the XOR problem
6.2.2 Bipolar vectors
6.2.3 Projection of the solution regions
6.2.4 Geometric interpretation
6.3 Regions for two layered networks
6.3.1 Regions in weight space for the XOR problem
6.3.2 Number of regions in general
6.3.3 Consequences
6.3.4 The Vapnik-Chervonenkis dimension
6.3.5 The problem of local minima
6.4 Historical and bibliographical remarks
7 The backpropagation algorithm
7.1 Learning as gradient descent
7.1.1 Differentiable activation functions
7.1.2 Regions in input space
7.1.3 Local minima of the error function
7.2 General feed-forward networks
7.2.1 The learning problem
7.2.2 Derivatives of network functions
7.2.3 Steps of the backpropagation algorithm
7.2.4 Learning with Backpropagation
7.3 The case of layered networks
7.3.1 Extended network
7.3.2 Steps of the algorithm
7.3.3 Backpropagation in matrix form
7.3.4 The locality of backpropagation
7.3.5 An Example
7.4 Recurrent networks
7.4.1 Backpropagation through time
7.4.2 Hidden Markov Models
7.4.3 Variational problems
7.5 Historical and bibliographical remarks
8 Fast learning algorithms
8.1 Introduction - Classical backpropagation
8.1.1 Backpropagation with momentum
8.1.2 The fractal geometry of backpropagation
8.2 Some simple improvements to backpropagation
8.2.1 Initial weight selection
8.2.2 Clipped derivatives and offset term
8.2.3 Reducing the number of floating-point operations
8.2.4 Data decorrelation
8.3 Adaptive step algorithms
8.3.1 Silva and Almeida´s algorithm
8.3.2 Delta-bar-delta
8.3.3 RPROP
8.3.4 The Dynamic Adaption Algorithm
8.4 Second-order algorithms
8.4.1 Quickprop
8.4.2 Second-order backpropagation
8.5 Relaxation methods
8.5.1 Weight and node perturbation
8.5.2 Symmetric and asymmetric relaxation
8.5.3 A final thought on taxonomy
8.6 Historical and bibliographical remarks
9 Statistics and Neural Networks
9.1 Linear and nonlinear regression
9.1.1 The problem of good generalization
9.1.2 Linear regression
9.1.3 Nonlinear units
9.1.4 Computing the prediction error
9.1.5 The jackknife and cross-validation
9.1.6 Committees of networks
9.2 Multiple regression
9.2.1 Visualization of the solution regions
9.2.2 Linear equations and the pseudoinverse
9.2.3 The bidden layer
9.2.4 Computation of the pseudoinverse
9.3 Classification networks
9.3.1 An application: NETtalk
9.3.2 The Bayes property of classifier networks
9.3.3 Connectionist speech recognition
9.3.4 Autoregressive models for time series analysis
9.4 Historical and bibliographical remarks
10 The complexity of learning
10.1 Network functions
10.1.1 Learning algorithms for multilayer networks
10.1.2 Hilbert´s problem and computability
10.1.3 Kolmogorov´s theorem
10.2 Function approximation
10.2.1 The one-dimensional case
10.2.2 The multidimensional case
10.3 Complexity of learning problems
10.3.1 Complexity classes
10.3.2 NP-complete learning problems
10.3.3 Complexity of learning with AND-OR networks
10.3.4 Simplifications of the network architecture
10.3.5 Learning with hints
10.4 Historical and bibliographical remarks
11 Fuzzy Logic
11.1 Fuzzy sets and fuzzy logic
11.1.1 Imprecise data and imprecise rules
11.1.2 The fuzzy set concept
11.1.3 Geometric representation of fuzzy sets
11.1.4 Set theory, logic operators and geometry
11.1.5 Families of fuzzy operators
11.2 Fuzzy inferences
11.2.1 Inferences from imprecise data
11.2.2 Fuzzy numbers and inverse operation
11.3 Control with fuzzy logic
11.3.1 Fuzzy controllers
11.3.2 Fuzzy networks
11.3.3 Function approximation with fuzzy methods
11.3.4 The eye as a fuzzy system - color vision
11.4 Historical and bibliographical remarks
12 Associative Networks
12.1 Associative pattern recognition
12.1.1 Recurrent networks and types of associative memories
12.1.2 Structure of an associative memory
12.1.3 The eigenvector automaton
12.2 Associative learning
12.2.1 Hebbian Learning - The correlation matrix
12.2.2 Geometric interpretation of Hebbian learning
12.2.3 Networks as dynamical systems - Some experiments
12.2.4 Another visualization
12.3 The capacity problem
12.4 The pseudoinverse
12.4.1 Definition and properties of the pseudoinverse
12.4.2 Orthogonal projections
12.4.3 Holographic memories
12.4.4 Translation invariant pattern recognition
12.5 Historical and bibliographical remarks
13 The Hopfield Model
13.1 Synchronous and asynchronous networks
13.1.1 Recursive networks with stochastic dynamics
13.1.2 The bidirectional associative memory
13.1.3 The energy function
13.2 Definition of Hopfield networks
13.2.1 Asynchronous networks
13.2.2 Examples of the model
13.2.3 Isomorphism between the Hopfield and Ising models
13.3 Converge to stable states
13.3.1 Dynamics of Hopfield networks
13.3.2 Convergence proof
13.3.3 Hebbian learning
13.4 Equivalence of Hopfield and perceptron learning
13.4.1 Perceptron learning in Hopfield networks
13.4.2 Complexity of learning in Hopfield models
13.5 Parallel combinatorics
13.5.1 NP-complete problems and massive parallelism
13.5.2 The multiflop problem
13.5.3 The eight rooks problem
13.5.4 The eight queens problem
13.5.5 The traveling salesman
13.5.6 The limits of Hopfield networks
13.6 Implementation of Hopfield networks
13.6.1 Electrical implementation
13.6.2 Optical implementation
13.7 Historical and bibliographical remarks
14 Stochastic networks
14.1 Variations of the Hopfield model
14.1.1 The continuous model
14.2 Stochastic systems
14.2.1 Simulated annealing
14.2.2 Stochastic neural networks
14.2.3 Markov chains
14.2.4 The Boltzmann distribution
14.2.5 Physical meaning of the Boltzmann distribution
14.3 Learning algorithms and applications
14.3.1 Boltzmann learning
14.3.2 Combinatorial optimization
14.4 Historical and bibliographical remarks
15 Kohonen networks
15.1 Self-organization
15.1.1 Charting input space
15.1.2 Topology preserving maps in the brain
15.2 Kohonen´s model
15.2.1 Learning algorithm
15.2.2 Mapping low dimensional spaces with high-dimensional grids
15.3 Analysis of convergence
15.3.1 Potential function - the one-dimensional case
15.3.2 The two-dimensional case
15.3.3 Effect of a unit´s neighborhood
15.3.4 Metastable states
15.3.5 What dimension for Kohonen networks?
15.4 Applications
15.4.1 Approximation of functions
15.4.2 Inverse kinematics
15.5 Historical and bibliographical remarks
16 Modular Neural Networks
16.1 Constructive algorithms for modular networks
16.1.1 Cascade correlation
16.1.2 Optimal modules and mixtures of experts
16.2 Hybrid networks
16.2.1 The ART architecures
16.2.2 Maximum entropy
16.2.3 Counterpropagation networks
16.2.4 Spline networks
16.2.5 Radial basis functions
16.3 Historical and bibliographical remarks
17 Genetic Algorithms
17.1 Coding and operators
17.1.1 Optimization problems
17.1.2 Methods of stochastic optimization
17.1.3 Genetic coding
17.1.4 Information exchange with genetic operators
17.2 Properties of genetic algorithms
17.2.1 Convergence analysis
17.2.2 Deceptive problems
17.2.3 Genetic drift
17.2.4 Gradient methods versus genetic algorithms
17.3 Neural networks and genetic algorithms
17.3.1 The problem of symmetries
17.3.2 A numerical experiment
17.3.3 Other applications of Gas
17.4 Historical and bibliographical remarks
18 Hardware for neural networks
18.1 Taxonomy of neural hardware
18.1.1 Performance requirements
18.1.2 Types of neurocomputers
18.2 Analog neural networks
18.2.1 Coding
18.2.2 VLSI transistor circuits
18.2.3 Transistors with stored charge
18.2.4 CCD components
18.3 Digital networks
18.3.1 Numerical representation of weights and signals
18.3.2 Vector and signal processors
18.3.3 Systolic arrays
18.3.4 One-dimensional structures
18.4 Innovative computer architectures
18.4.1 VLSI microprocessors for neural networks
18.4.2 Optical computers
18.4.3 Pulse coded networks
18.5 Historical and bibliographical remarks
Tel: ++49/30/83875130