| |
| |
Preface | |
| |
| |
Introduction | |
| |
| |
| |
What is a Neural Network? | |
| |
| |
| |
The Human Brain | |
| |
| |
| |
Models of a Neuron | |
| |
| |
| |
Neural Networks Viewed As Directed Graphs | |
| |
| |
| |
Feedback | |
| |
| |
| |
Network Architectures | |
| |
| |
| |
Knowledge Representation | |
| |
| |
| |
Learning Processes | |
| |
| |
| |
Learning Tasks | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Notes and References | |
| |
| |
| |
Rosenblatt's Perceptron | |
| |
| |
| |
Introduction | |
| |
| |
| |
Perceptron | |
| |
| |
| |
The Perceptron Convergence Theorem | |
| |
| |
| |
Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment | |
| |
| |
| |
Computer Experiment: Pattern Classification | |
| |
| |
| |
The Batch Perceptron Algorithm | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Model Building through Regression | |
| |
| |
| |
Introduction | |
| |
| |
| |
Linear Regression Model: Preliminary Considerations | |
| |
| |
| |
Maximum a Posteriori Estimation of the Parameter Vector | |
| |
| |
| |
Relationship Between Regularized Least-Squares Estimation and MAP Estimation | |
| |
| |
| |
Computer Experiment: Pattern Classification | |
| |
| |
| |
The Minimum-Description-Length Principle | |
| |
| |
| |
Finite Sample-Size Considerations | |
| |
| |
| |
The Instrumental-Variables Method | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
The Least-Mean-Square Algorithm | |
| |
| |
| |
Introduction | |
| |
| |
| |
Filtering Structure of the LMS Algorithm | |
| |
| |
| |
Unconstrained Optimization: a Review | |
| |
| |
| |
The Wiener Filter | |
| |
| |
| |
The Least-Mean-Square Algorithm | |
| |
| |
| |
Markov Model Portraying the Deviation of the LMS Algorithm from the Wiener Filter | |
| |
| |
| |
The Langevin Equation: Characterization of Brownian Motion | |
| |
| |
| |
Kushner's Direct-Averaging Method | |
| |
| |
| |
Statistical LMS Learning Theory for Small Learning-Rate Parameter | |
| |
| |
| |
Computer Experiment I: Linear Prediction | |
| |
| |
| |
Computer Experiment II: Pattern Classification | |
| |
| |
| |
Virtues and Limitations of the LMS Algorithm | |
| |
| |
| |
Learning-Rate Annealing Schedules | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Multilayer Perceptrons | |
| |
| |
| |
Introduction | |
| |
| |
| |
Some Preliminaries | |
| |
| |
| |
Batch Learning and On-Line Learning | |
| |
| |
| |
The Back-Propagation Algorithm | |
| |
| |
| |
XOR Problem | |
| |
| |
| |
Heuristics for Making the Back-Propagation Algorithm Perform Better | |
| |
| |
| |
Computer Experiment: Pattern Classification | |
| |
| |
| |
Back Propagation and Differentiation | |
| |
| |
| |
The Hessian and Its Role in On-Line Learning | |
| |
| |
| |
Optimal Annealing and Adaptive Control of the Learning Rate | |
| |
| |
| |
Generalization | |
| |
| |
| |
Approximations of Functions | |
| |
| |
| |
Cross-Validation | |
| |
| |
| |
Complexity Regularization and Network Pruning | |
| |
| |
| |
Virtues and Limitations of Back-Propagation Learning | |
| |
| |
| |
Supervised Learning Viewed as an Optimization Problem | |
| |
| |
| |
Convolutional Networks | |
| |
| |
| |
Nonlinear Filtering | |
| |
| |
| |
Small-Scale Versus Large-Scale Learning Problems | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Kernel Methods and Radial-Basis Function Networks | |
| |
| |
| |
Introduction | |
| |
| |
| |
Cover's Theorem on the Separability of Patterns | |
| |
| |
| |
The Interpolation Problem | |
| |
| |
| |
Radial-Basis-Function Networks | |
| |
| |
| |
K-Means Clustering | |
| |
| |
| |
Recursive Least-Squares Estimation of the Weight Vector | |
| |
| |
| |
Hybrid Learning Procedure for RBF Networks | |
| |
| |
| |
Computer Experiment: Pattern Classification | |
| |
| |
| |
Interpretations of the Gaussian Hidden Units | |
| |
| |
| |
Kernel Regression and Its Relation to RBF Networks | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Support Vector Machines | |
| |
| |
| |
Introduction | |
| |
| |
| |
Optimal Hyperplane for Linearly Separable Patterns | |
| |
| |
| |
Optimal Hyperplane for Nonseparable Patterns | |
| |
| |
| |
The Support Vector Machine Viewed as a Kernel Machine | |
| |
| |
| |
Design of Support Vector Machines | |
| |
| |
| |
XOR Problem | |
| |
| |
| |
Computer Experiment: Pattern Classification | |
| |
| |
| |
Regression: Robustness Considerations | |
| |
| |
| |
Optimal Solution of the Linear Regression Problem | |
| |
| |
| |
The Representer Theorem and Related Issues | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Regularization Theory | |
| |
| |
| |
Introduction | |
| |
| |
| |
Hadamard's Conditions for Well-Posedness | |
| |
| |
| |
Tikhonov's Regularization Theory | |
| |
| |
| |
Regularization Networks | |
| |
| |
| |
Generalized Radial-Basis-Function Networks | |
| |
| |
| |
The Regularized Least-Squares Estimator: Revisited | |
| |
| |
| |
Additional Notes of Interest on Regularization | |
| |
| |
| |
Estimation of the Regularization Parameter | |
| |
| |
| |
Semisupervised Learning | |
| |
| |
| |
Manifold Regularization: Preliminary Considerations | |
| |
| |
| |
Differentiable Manifolds | |
| |
| |
| |
Generalized Regularization Theory | |
| |
| |
| |
Spectral Graph Theory | |
| |
| |
| |
Generalized Representer Theorem | |
| |
| |
| |
Laplacian Regularized Least-Squares Algorithm | |
| |
| |
| |
Experiments on Pattern Classification Using Semisupervised Learning | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Principal-Components Analysis | |
| |
| |
| |
Introduction | |
| |
| |
| |
Principles of Self-Organization | |
| |
| |
| |
Self-Organized Feature Analysis | |
| |
| |
| |
Principal-Components Analysis: Perturbation Theory | |
| |
| |
| |
Hebbian-Based Maximum Eigenfilter | |
| |
| |
| |
Hebbian-Based Principal-Components Analysis | |
| |
| |
| |
Case Study: Image Coding | |
| |
| |
| |
Kernel Principal-Components Analysis | |
| |
| |
| |
Basic Issues Involved in the Coding of Natural Images | |
| |
| |
| |
Kernel Hebbian Algorithm | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Self-Organizing Maps | |
| |
| |
| |
Introduction | |
| |
| |
| |
Two Basic Feature-Mapping Models | |
| |
| |
| |
Self-Organizing Map | |
| |
| |
| |
Properties of the Feature Map | |
| |
| |
| |
Computer Experiments I: Disentangling Lattice Dynamics Using SOM | |
| |
| |
| |
Contextual Maps | |
| |
| |
| |
Hierarchical Vector Quantization | |
| |
| |
| |
Kernel Self-Organizing Map | |
| |
| |
| |
Computer Experiment II: Disentangling Lattice Dynamics Using Kernel SOM | |
| |
| |
| |
Relationship Between Kernel SOM and Kullback-Leibler Divergence | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Information-Theoretic Learning Models | |
| |
| |
| |
Introduction | |
| |
| |
| |
Entropy | |
| |
| |
| |
Maximum-Entropy Principle | |
| |
| |
| |
Mutual Information | |
| |
| |
| |
Kullback-Leibler Divergence | |
| |
| |
| |
Copulas | |
| |
| |
| |
Mutual Information as an Objective Function to be Optimized | |
| |
| |
| |
Maximum Mutual Information Principle | |
| |
| |
| |
Infomax and Redundancy Reduction | |
| |
| |
| |
Spatially Coherent Features | |
| |
| |
| |
Spatially Incoherent Features | |
| |
| |
| |
Independent-Components Analysis | |
| |
| |
| |
Sparse Coding of Natural Images and Comparison with ICA Coding | |
| |
| |
| |
Natural-Gradient Learning for Independent-Components Analysis | |
| |
| |
| |
Maximum-Likelihood Estimation for Independent-Components Analysis | |
| |
| |
| |
Maximum-Entropy Learning for Blind Source Separation | |
| |
| |
| |
Maximization of Negentropy for Independent-Components Analysis | |
| |
| |
| |
Coherent Independent-Components Analysis | |
| |
| |
| |
Rate Distortion Theory and Information Bottleneck | |
| |
| |
| |
Optimal Manifold Representation of Data | |
| |
| |
| |
Computer Experiment: Pattern Classification | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Stochastic Methods Rooted in Statistical Mechanics | |
| |
| |
| |
Introduction | |
| |
| |
| |
Statistical Mechanics | |
| |
| |
| |
Markov Chains | |
| |
| |
| |
Metropolis Algorithm | |
| |
| |
| |
Simulated Annealing | |
| |
| |
| |
Gibbs Sampling | |
| |
| |
| |
Boltzmann Machine | |
| |
| |
| |
Logistic Belief Nets | |
| |
| |
| |
Deep Belief Nets | |
| |
| |
| |
Deterministic Annealing | |
| |
| |
| |
Analogy of Deterministic Annealing with Expectation-Maximization Algorithm | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Dynamic Programming | |
| |
| |
| |
Introduction | |
| |
| |
| |
Markov Decision Process | |
| |
| |
| |
Bellman's Optimality Criterion | |
| |
| |
| |
Policy Iteration | |
| |
| |
| |
Value Iteration | |
| |
| |
| |
Approximate Dynamic Programming: Direct Methods | |
| |
| |
| |
Temporal-Difference Learning | |
| |
| |
| |
Q-Learning | |
| |
| |
| |
Approximate Dynamic Programming: Indirect Methods | |
| |
| |
| |
Least-Squares Policy Evaluation | |
| |
| |
| |
Approximate Policy Iteration | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Neurodynamics | |
| |
| |
| |
Introduction | |
| |
| |
| |
Dynamic Systems | |
| |
| |
| |
Stability of Equilibrium States | |
| |
| |
| |
Attractors | |
| |
| |
| |
Neurodynamic Models | |
| |
| |
| |
Manipulation of Attractors as a Recurrent Network Paradigm | |
| |
| |
| |
Hopfield Model | |
| |
| |
| |
The Cohen-Grossberg Theorem | |
| |
| |
| |
Brain-State-In-A-Box Model | |
| |
| |
| |
Strange Attractors and Chaos | |
| |
| |
| |
Dynamic Reconstruction of a Chaotic Process | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Bayseian Filtering for State Estimation of Dynamic Systems | |
| |
| |
| |
Introduction | |
| |
| |
| |
State-Space Models | |
| |
| |
| |
Kalman Filters | |
| |
| |
| |
The Divergence-Phenomenon and Square-Root Filtering | |
| |
| |
| |
The Extended Kalman Filter | |
| |
| |
| |
The Bayesian Filter | |
| |
| |
| |
Cubature Kalman Filter: Building on the Kalman Filter | |
| |
| |
| |
Particle Filters | |
| |
| |
| |
Computer Experiment: Comparative Evaluation of Extended Kalman and Particle Filters | |
| |
| |
| |
Kalman Filtering in Modeling of Brain Functions | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
| |
Dynamically Driven Recurrent Networks | |
| |
| |
| |
Introduction | |
| |
| |
| |
Recurrent Network Architectures | |
| |
| |
| |
Universal Approximation Theorem | |
| |
| |
| |
Controllability and Observability | |
| |
| |
| |
Computational Power of Recurrent Networks | |
| |
| |
| |
Learning Algorithms | |
| |
| |
| |
Back Propagation Through Time | |
| |
| |
| |
Real-Time Recurrent Learning | |
| |
| |
| |
Vanishing Gradients in Recurrent Networks | |
| |
| |
| |
Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators | |
| |
| |
| |
Computer Experiment: Dynamic Reconstruction of Mackay-Glass Attractor | |
| |
| |
| |
Adaptivity Considerations | |
| |
| |
| |
Case Study: Model Reference Applied to Neurocontrol | |
| |
| |
| |
Summary and Discussion | |
| |
| |
Notes and References | |
| |
| |
Problems | |
| |
| |
Bibliography | |
| |
| |
Index | |