mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jake Mannix (JIRA)" <>
Subject [jira] Commented: (MAHOUT-364) [GSOC] Proposal to implement Neural Network with backpropagation learning on Hadoop
Date Wed, 14 Apr 2010 00:43:55 GMT


Jake Mannix commented on MAHOUT-364:


  Any form of BSD-style license _is_ compatible (Mozilla, Eclipse, Apache public licenses,
as well as of course BSD itself).  No form of GPL (GPL v2, v3, LGPL, or AGPL) are compatible.

  If you can get your fellow contributors to agree to change the licensing to something non-viral
(that's the issue with GPL: it requires that software which integrates with it also be GPL,
while Apache/BSD/MPL/EPL all say, "Steal it if you want, just admit you did it!"), we'd love
to get integration or even complete adoption/absorption of Neuroph.

  Can you talk it over with your compatriots regarding changing to something like the Apache
License: ?

> [GSOC] Proposal to implement Neural Network with backpropagation learning on Hadoop
> -----------------------------------------------------------------------------------
>                 Key: MAHOUT-364
>                 URL:
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Zaid Md. Abdul Wahab Sheikh
> Proposal Title: Implement Multi-Layer Perceptrons with backpropagation learning on Hadoop
(addresses issue Mahout-342)
> Student Name: Zaid Md. Abdul Wahab Sheikh
> Student E-mail: (gmail id) sheikh.zaid
> h2. I. Brief Description
> A feedforward neural network (NN) reveals several degrees of parallelism within it such
as weight parallelism, node parallelism, network parallelism, layer parallelism and training
parallelism. However network based parallelism requires fine-grained synchronization and communication
and thus is not suitable for map/reduce based algorithms. On the other hand, training-set
parallelism is coarse grained. This can be easily exploited on Hadoop which can split up the
input among different mappers. Each of the mappers will then propagate the 'InputSplit' through
their own copy of the complete neural network.
> The backpropagation algorithm will operate in batch mode. This is because updating a
common set of parameters after each training example creates a bottleneck for parallelization.
The overall error gradient vector calculation can be parallelized by calculating the gradients
from each training vector in the Mapper, combining them to get partial batch gradients and
then adding them in a reducer to get the overall batch gradient.
> In a similiar manner, error function evaluations during line searches (for the conjugate
gradient and quasi-Newton algorithms) can be efficiently parallelized.
> Lastly, to avoid local minima in its error function, we can take advantage of training
session parallelism to start multiple training sessions in parallel with different initial
weights (simulated annealing).
> h2. II. Detailed Proposal
> The most important step is to design the base neural network classes in such a way that
other NN architectures like Hopfield nets, Boltzman machines, SOM etc can be easily implemented
by deriving from these base classes. For that I propose to implement a set of core classes
that correspond to basic neural network concepts like artificial neuron, neuron layer, neuron
connections, weight, transfer function, input function, learning rule etc. This architecture
is inspired from that of the opensource Neuroph neural network framework (
This design of the base architecture allows for great flexibility in deriving newer NNs and
learning rules. All that needs to be done is to derive from the NeuralNetwork class, provide
the method for network creation, create a new training method by deriving from LearningRule,
and then add that learning rule to the network during creation. In addition, the API is very
intuitive and easy to understand (in comparision to other NN frameworks like Encog and JOONE).
> h3. The approach to parallelization in Hadoop:
> In the Driver class:
> - The input parameters are read and the NeuralNetwork with a specified LearningRule (training
algorithm) created.
> - Initial weight values are randomly generated and written to the FileSystem. If number
of training sessions (for simulated annealing) is specified, multiple sets of initial weight
values are generated.
> - Training is started by calling the NeuralNetwork's learn() method. For each iteration,
every time the error gradient vector needs to be calculated, the method submits a Job where
the input path to the training-set vectors and various key properties (like path to the stored
weight values) are set. The gradient vectors calculated by the Reducers are written back to
an output path in the FileSystem.
> - After the JobClient.runJob() returns, the gradient vectors are retrieved from the FileSystem
and tested to see if the stopping criterion is satisfied. The weights are then updated, using
the method implemented by the particular LearningRule. For line searches, each error function
evaluation is again done by submitting a job.
> - The NN is trained in iterations until it converges.
> In the Mapper class:
> - Each Mapper is initialized using the configure method, the weights are retrieved and
the complete NeuralNetwork created.
> - The map function then takes in the training vectors as key/value pairs (the key is
ignored), runs them through the NN to calculate the outputs and backpropagates the errors
to find out the error gradients. The error gradient vectors are then output as key/value pairs
where all the keys are set to a common value, such as the training session number (for each
training session, all keys in the outputs of all the mappers have to be identical).
> In the Combiner class:
> - Iterates through the all individual error gradient vectors output by the mapper (since
they all have the same key) and adds them up to get a partial batch gradient.
> In the Reducer class:
> - There's a single reducer class that will combine all the partial gradients from the
Mappers to get the overall batch gradient.
> - The final error gradient vector is written back to the FileSystem
> h3. I propose to complete all of the following sub-tasks during GSoC 2010:
> Implementation of the Backpropagation algorithm:
> - Initialization of weights: using the Nguyen-Widrow algorithm to select the initial
range of starting weight values.
> - Input, transfer and error functions: implement basic input functions like WeightedSum
and transfer functions like Sigmoid, Gaussian, tanh and linear. Implement the sum-of-squares
error function.
> - Optimization methods to update the weights: (a) Batch Gradient descent, with momentum
and a variable learning rate method [2] (b) A Conjugate gradient method with Brent's line
> Improving generalization:
> - Validating the network to test for overfitting (Early stopping method)
> - Regularization (weight decay method)
> Create examples for:
> - Classification: using the Abalone Data Set  from UCI Machine Learning Repository
> - Classification, Regression: Breast Cancer Wisconsin (Prognostic) Data Set
> If time permits, also implement:
> - Resilient Backpropagation (RPROP)
> h2. III. Week Plan with list of deliverables
> * (Till May 23rd, community bonding period)
> Brainstorm with my mentor and the Apache Mahout community to come up with the most optimal
design for an extensible Neural Network framework. Code prototypes to identify potential problems
and/or investigate new solutions.
> Deliverable: A detailed report or design document on how to implement the basic Neural
Network framework and the learning algorithms.
> * (May 24th, coding starts) Week 1: 
> Deliverables: Basic Neural network classes (Neuron, Connection, Weight, Layer, LearningRule,
NeuralNetwork etc) and the various input, transfer and error functions mentioned previously.
> * (May 31st) Week 2 and Week 3: 
> Deliverable: Driver, Mapper, Combiner and Reducer classes with basic functonality to
run a feedforward Neural Network on Hadoop (no training methods yet, weights are generated
using Nguyen-Widrow algorithm). 
> * (June 14th) Week 4: 
> Deliverable: Backpropagation algorithm using standard Batch Gradient descent.
> * (June 21st) Week 5: 
> Deliverables: Variable learning rate and momentum during Batch Gradient descent. Validation
tests support. Do some big tests.
> * (June 28th) Week 6: 
> Deliverable: Support for Early stopping and Regularization (weight decay) during training.
> * (July 5th) Week 7 and Week 8: 
> Deliverable: Conjugate gradient method with Brent's line search algorithm. 
> * (July 19th) Week 9: 
> Deliverable: Write unit tests. Do bigger scale tests for both batch gradient descent
and conjugate gradient method.
> * (July 26th) Week 10 and Week 11:
> Deliverable: 2 examples of classification and regression on real-world datasets from
UCI Machine Learning Repository. More tests.
> * (August 9th, tentative 'pencils down' date) Week 12: 
> Deliverable: Wind up the work. Scrub code. Improved documentation, tutorials (on the
wiki) etc.
> * (August 16: Final evaluation)
> h2. IV. Additional Information
> I am a final year Computer Science student at NIT Allahabad (India) graduating in May.
For my final year project/thesis, I am working on Open Domain Question Answering. I participated
in GSoC last year for the Apertium machine translation system (
I am familiar with the three major opensource Neural Network frameworks in Java, JOONE, Encog
and Neuroph since I have used them in past projects on fingerprint recognition and face recognition
(during a summer course on image and speech processing). My research interests are machine
learning and statistical natural language processing and I will be enrolling for a Ph.D. next
semester(i.e. next fall) in the same institute. 
> I have no specific time constraints throughout the GSoC period. I will devote a minimum
of 6 hours everyday to GSoC.
> Time offset: UTC+5:30 (IST)
> h2. V. References
> [1] Fast parallel off-line training of multilayer perceptrons, S McLoone, GW Irwin -
IEEE Transactions on Neural Networks, 1997
> [2] Optimization of the backpropagation algorithm for training multilayer perceptrons,
W. Schiffmann, M. Joost and R. Werner, 1994
> [3] Map-Reduce for Machine Learning on Multicore, Cheng T. Chu, Sang K. Kim, Yi A. Lin,
et al - in NIPS, 2006
> [4] Neural networks for pattern recognition, CM Bishop - 1995 [BOOK] 

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


View raw message