hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Does Hama Graph provides any file reader interface during running time ?
Date Thu, 20 Sep 2012 13:22:40 GMT
Hi,

nice idea, but I'm certainly unsure if the graph module really fits your
needs.
In Backprop you need to set the input to different neurons in your input
layer and you have to forwardpropagate these until you reach the output
layer. Calculating the error from this single step in your architecture
would consume many supersteps. This is totally inefficient in my
opinion, but let's just take this thought away.

Assuming you have an n by m matrix which contains your whole trainingset
and in the m-th column there is the outcome of the previous features.
A input vertex should have the ability to read a row of the corresponding
column vector from the trainingset and the output neurons need to do the
same.
Good news, you can do this by reading a file within the setup function of a
vertex or by reading it line by line when compute is called. You can access
filesystems with the Hadoop DFS API pretty easily. Just type it into your
favourite search engines, it is just called FileSystem and you can get it
by using FileSystem.get(Configuration conf).

Now here is my experience with a raw BSP and neural networks if you
consider this against the graph module:
- partition the neurons horizontally (through the layers) not by the layers
- weights mustbe averaged across multiple tasks

I came for myself to conclude that it is fairly better to implement a
function optimizer with raw BSP to train the weights (a simple
StochasticGradientDescent totally works out for almost every normal usecase
if your network has a convex costfunction).
Of course this doesn't work out well for higher dimensionalities, but more
data usually wins, even with simpler models. At the end you can always
boost it anyway.

I will of course support you on this if you like, I'm fairly certain that
your way can work, but will be slow as hell.
Just my usual two cents on various topics ;)

2012/9/20 顾荣 <gurongwalker@gmail.com>

> Hi, guys.
>
> As you are calling for some application programs on Hama in the *Future
> Plans* of the Hama programming wiki here (
>
> https://issues.apache.org/jira/secure/attachment/12528218/ApacheHamaBSPProgrammingmodel.pdf
> ),
> I am so interested in machine learning. I have a plan to implement neural
> networks (eg.Multilayer Perceptron with BP) on Hama. Hama seems to be a
> nice tool for training large scale neural networks. Esepcailly, for those
> with large scale structure (many hidden layers and many neurons), I find
> Hama Graph provided a good solution. We can regard each neuron in NN(neural
> network) as a vertex in Hama Graph, and the links between neurons as eages
> in the Graph. Then, the training process can be regarded as updating the
> weights of the eages among vetices. However, I encounted a problem in the
> current Hama Graph implementation.
>
> Let me explain this to you. As you maybe now, during the training process
> of many machine learning algorithms, we need to input many training samples
> into the model one by one. Usaually, more training samples will lead to
> preciser models. However, as far as I know, the only input file interface
> provided by the Hama Graph is the input for graph structure. Sadly, it's
> hard to read the distribute the training samples during running time, as
> users can only make their computing logics by overriding the some key
> functions such as compute() int the Vetex class. So, does hama graph
> provide any flexible file reading interface for users in running time?
>
> Thanks in advance.
>
> Walker.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message