incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Hama receive queue
Date Fri, 18 May 2012 07:28:51 GMT
Cool, I'd be glad to help you on the way ;)
Just have a few notes:

 procId = Integer.parseInt(bspPeer.getPeerName().split(":")[1]);
>

This is a good observation, but in other modes than the local mode this is
a host:port tuple. So your "hack" won't work, but the peerNames array
returned by "bspPeer.getAllPeerNames()" is sorted on each task, so you just
have to get the index of your peer name. e.G. with:

procId =
> Arrays.binarySearch(bspPeer.getAllPeerNames(),bspPeer.getPeerName());
>

As told in the mail before, I think you will need a row partitioning of the
matrix. I made a very simplistic matrix multiplication in BSP [1], if you
scroll down, you will see a partitioner based on row number.
So your input file (I recommend sequencefiles) have to be <IntWritable,
ArrayWritable/your ArrayMessage> as input type.
The partitioner will take care of splitting the files accordingly and give
it a task.

[1]
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/bsp/MatrixMultiplicationBSP.java


2012/5/18 Aditya Sarawgi <sarawgi.aditya@gmail.com>

> Hi,
>
> The main optimization step is still left, I wanted to be sure that I get
> ICF right before moving ahead. And the time complexity of the entire
> algorithm is dominated by ICF decomposition.
> Will update you guys soon when I have the final implementation done, I
> am eager to try it on datasets as well :)
>
>
> On Fri, May 18, 2012 at 1:44 AM, Thomas Jungblut <
> thomas.jungblut@googlemail.com> wrote:
>
> > Thanks for the explanation!
> > I have plenty of time today so I can clone your stuff and play arround
> with
> > it.
> > Are there any steps left to use this as SVM? I wanted to try it out on
> the
> > mushroom set.
> >
> > 2012/5/18 Aditya Sarawgi <sarawgi.aditya@gmail.com>
> >
> > > @Edward its not urgent, I am ready when you are :)
> > >
> > > @Thomas Thanks for the feedback and help. Sure, you can use the code
> > > for the jiras. But do remember it is slightly different from the actual
> > icf
> > > in the sense
> > > that here the dimension of the result matrix would n x p ( where p is
> > > typically sqrt(n) )
> > > and the approximation error changes with what p. If p is close to n the
> > > error is low.
> > >
> > > It seems to work on smaller matrices pretty well. I tried it by varying
> > the
> > > values of p and
> > > as p approaches n, the decomposition has less error.
> > > I have to do some more testing though.
> > >
> > >
> > > On Thu, May 17, 2012 at 11:06 AM, Thomas Jungblut <
> > > thomas.jungblut@googlemail.com> wrote:
> > >
> > > > instanceof is slow as hell, but if you have no other solution then
> this
> > > is
> > > > okay.
> > > >
> > > > 2) What is like the standard way to load matrices in different nodes
> > > with a
> > > > > custom partitioning scheme
> > > >
> > > >
> > > > It is depending on your algorithm needs, but I think you will need to
> > > > implement your own partitioner, since HashPartitioning may not apply
> to
> > > > this ICF.
> > > > Generally you need to use the input system to read a part of a matrix
> > > into
> > > > each peer.
> > > >
> > > > We also script a mapreduce job that will create random input for x GB
> > to
> > > > check scalability.
> > > > Here is that for graphs:
> > https://issues.apache.org/jira/browse/HAMA-558
> > > > But I think this is easily extendable to matrices. There is an issue
> > for
> > > > that as well, I don't know how far Mikalai came with that.
> > > >
> > > > BTW your code looks good ;)
> > > >
> > > > Can we use this for https://issues.apache.org/jira/browse/HAMA-94 or
> > > > https://issues.apache.org/jira/browse/HAMA-553 ? Would be a great
> > > addition
> > > > if it works!
> > > >
> > > > Greetings from Germany,
> > > > Thomas
> > > >
> > > > 2012/5/17 Aditya Sarawgi <sarawgi.aditya@gmail.com>
> > > >
> > > > > Thanks Thomas.
> > > > > I am actually using tags for something else. So for now using
> > > instanceof
> > > > is
> > > > > just fine with me.
> > > > >
> > > > > I had a couple of more questions, regarding benchmarking stuff on
> > > hama. I
> > > > > have a working implementation  of
> > > > > Parallel row based icf that given a n x n matrix returns a
> > decomposed n
> > > > x p
> > > > > matrix.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/truncs/hello-world/blob/master/shttps://issues.apache.org/jira/browse/HAMA-558rc/main/java/edu/sunysb/cs/Icf.java
> > > > <
> > > >
> > >
> >
> https://github.com/truncs/hello-world/blob/master/src/main/java/edu/sunysb/cs/Icf.java
> > > > >
> > > > >
> > > > > Now I would like to test this on a big input and possibly in full
> > > > > distributed mode, so I was wondering how do
> > > > > people usually do these sort of benchmarking.
> > > > >
> > > > > Specifically,
> > > > > 1) Do they setup a cluster on AWS ?
> > > > > 2) What is like the standard way to load matrices in different
> nodes
> > > > with a
> > > > > custom partitioning scheme
> > > > > 3) Is there anything else that I should know
> > > > >
> > > > > On Thu, May 17, 2012 at 3:20 AM, Thomas Jungblut <
> > > > > thomas.jungblut@googlemail.com> wrote:
> > > > >
> > > > > > Hi Aditya,
> > > > > >
> > > > > > that's where the concept of Message Tagging comes into play.
You
> > have
> > > > > tags
> > > > > > in each message which are hardcoded as Strings.
> > > > > > But as Edward told you can use GenericWritable or ObjectWritable
> > > > instead,
> > > > > > so they will tag your messages with the classnames and give
you
> the
> > > > > correct
> > > > > > class.
> > > > > >
> > > > > > Is there any way by which I can pop from the receive queue ?
> > > > > >
> > > > > >
> > > > > > peer.getCurrentMessage() is popping from the received queue.
> > > > > >
> > > > > > 2012/5/17 Aditya Sarawgi <sarawgi.aditya@gmail.com>
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > But thats not the only problem, consider this case
> > > > > > > that there are variable number of messages being sent,
so I
> would
> > > > have
> > > > > to
> > > > > > > maintain
> > > > > > > counts for each peer pointing to the last unread message.
> > > > > > >
> > > > > > > Is there any way by which I can pop from the receive queue
?
> > > > > > >
> > > > > > >
> > > > > > > On Wed, May 16, 2012 at 10:23 PM, Suraj Menon <
> > > > surajsmenon@apache.org
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Please take a look at this snippet of code copied
and
> modified
> > > from
> > > > > > > > Mapper class to implement your scenario. -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/ssmenon/hama/edit/master/hama-mapreduce/src/org/apache/hama/computemodel/mapreduce/Trials.java
> > > > > > > > Between lines 233 to 245 I am able to send different
type of
> > > > > messages.
> > > > > > > > With type checks and generics you shouldn't be encountering
> > > > Classcast
> > > > > > > > exception at receiving end too. I am yet to test the
next
> > > > superstep,
> > > > > > > > shall update you with sample code for the next superstep
> > > mimicking
> > > > > > > > your scenario for receiving.
> > > > > > > >
> > > > > > > > For elegance, we have an experimental Superstep#compute
> > > > > > > > API(org.apache.hama.bsp.Superstep). I have encountered
an
> issue
> > > in
> > > > > job
> > > > > > > > submission framework with this method in distributed
mode;
> fix
> > > for
> > > > > > > > this would be pushed to trunk in next few hours. You
can
> still
> > > run
> > > > it
> > > > > > > > using  LocalBSPRunner for now.
> > > > > > > >
> > > > > > > > -Suraj
> > > > > > > >
> > > > > > > > On Wed, May 16, 2012 at 9:18 PM, Aditya Sarawgi
> > > > > > > > <sarawgi.aditya@gmail.com> wrote:
> > > > > > > > > Hi Edward,
> > > > > > > > >
> > > > > > > > > Yes that is what I did
> > > > > > > > > I wrote an ArrayMessage class (doesn't use generics
for now
> > but
> > > > can
> > > > > > be
> > > > > > > > > converted easily)
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/truncs/hello-world/blob/master/src/main/java/edu/sunysb/cs/ArrayMessage.java
> > > > > > > > >
> > > > > > > > > But the problem is that I am sending a IntegerMessage
> before
> > > and
> > > > > > after
> > > > > > > > > reading the IntegerMessage I am sending
> > > > > > > > > an ArrayMessage but the previous IntegerMessage
is still
> > there.
> > > > > > > > >
> > > > > > > > > On Wed, May 16, 2012 at 8:34 PM, Edward J. Yoon
<
> > > > > > edwardyoon@apache.org
> > > > > > > > >wrote:
> > > > > > > > >
> > > > > > > > >> Hi,
> > > > > > > > >>
> > > > > > > > >> To send or receive multiple Message types,
I think you can
> > use
> > > > > > > > >> GenericWritable. You can also implement your
own
> > > GenericMessage
> > > > > and
> > > > > > > > >> contribute it to our project!
> > > > > > > > >>
> > > > > > > > >> Hope this helps you.
> > > > > > > > >>
> > > > > > > > >> On Thu, May 17, 2012 at 7:48 AM, Aditya Sarawgi
> > > > > > > > >> <sarawgi.aditya@gmail.com> wrote:
> > > > > > > > >> > Hi Guys,
> > > > > > > > >> >
> > > > > > > > >> > I am wondering how do the receive queues
in hama work.
> > > > Consider
> > > > > > this
> > > > > > > > case
> > > > > > > > >> > that I want to sent a different type
of BSPMessage in 2
> > > > > > consecutive
> > > > > > > > >> > superstep.
> > > > > > > > >> > In this first superstep I am sending
IntMessage and in
> the
> > > > next
> > > > > > one
> > > > > > > I
> > > > > > > > am
> > > > > > > > >> > sending a ArrayMessage ( custom message
class).
> > > > > > > > >> >
> > > > > > > > >> > Now in the second super step when I
do a
> > > > > > > > >> >  while ((arrayMessage = (ArrayMessage)
> > > > peer.getCurrentMessage())
> > > > > > !=
> > > > > > > > >> null) {
> > > > > > > > >> >
> > > > > > > > >> > it is throwing a java.lang.ClassCastException,
which is
> > > > obvious
> > > > > > > since
> > > > > > > > its
> > > > > > > > >> > trying to cast IntMessage to ArrayMessage.
> > > > > > > > >> > I thought the message is dropped from
the queue after it
> > is
> > > > > read,
> > > > > > is
> > > > > > > > this
> > > > > > > > >> > not the case ?
> > > > > > > > >> > And if it is not, how can this be handled
elegantly ?
> > > > > > > > >> >
> > > > > > > > >> > --
> > > > > > > > >> > Cheers,
> > > > > > > > >> > Aditya Sarawgi
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Best Regards, Edward J. Yoon
> > > > > > > > >> @eddieyoon
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Cheers,
> > > > > > > > > Aditya Sarawgi
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Cheers,
> > > > > > > Aditya Sarawgi
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thomas Jungblut
> > > > > > Berlin <thomas.jungblut@gmail.com>
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Aditya Sarawgi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thomas Jungblut
> > > > Berlin <thomas.jungblut@gmail.com>
> > > >
> > >
> > >
> > >
> > > --
> > > Cheers,
> > > Aditya Sarawgi
> > >
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <thomas.jungblut@gmail.com>
> >
>
>
>
> --
> Cheers,
> Aditya Sarawgi
>



-- 
Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message