Return-Path: X-Original-To: apmail-incubator-hama-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-hama-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C39A4CECA for ; Fri, 18 May 2012 07:29:21 +0000 (UTC) Received: (qmail 26004 invoked by uid 500); 18 May 2012 07:29:21 -0000 Delivered-To: apmail-incubator-hama-dev-archive@incubator.apache.org Received: (qmail 25881 invoked by uid 500); 18 May 2012 07:29:20 -0000 Mailing-List: contact hama-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-dev@incubator.apache.org Delivered-To: mailing list hama-dev@incubator.apache.org Received: (qmail 25855 invoked by uid 99); 18 May 2012 07:29:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 07:29:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of thomas.jungblut@googlemail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vb0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 07:29:12 +0000 Received: by vbbfr13 with SMTP id fr13so2542364vbb.6 for ; Fri, 18 May 2012 00:28:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YRmf0PqVvZSOd57zuCgBNZaNhfFY2NeCs+WECFRSvKA=; b=aTKDBVuqcOlvKWQmyzz/t6i8zOXwpksP0WmtawxWK5ahmUXTInK26zpVp/nI3xlmlp zC03b9gIJx1LvvlW85HWrdW1DB3rvK8s5b9bagwH9lMew9vU+JGXIQqGv8+/sJHmjuKf /hJIGOCdvidIyDhd+ae3ez0kHg3aGIOG1/LRrv3sqf/cUfjQa5JTr3KAYJg1GTgl2buc Q3LS7lUyyy/F97SAvseo5TDkQuj0ZZztV3N5w9zsoU0GoxAXmYNJc8BEincFKQjKSZ1L 8Zi+m3NPoF0nRoFKToeaFfgcLTDt575GWAdwUtODVFE8zn+9teJweMiQFjVJVbWE4rRi YxeA== MIME-Version: 1.0 Received: by 10.220.220.83 with SMTP id hx19mr6337914vcb.53.1337326131165; Fri, 18 May 2012 00:28:51 -0700 (PDT) Received: by 10.221.11.68 with HTTP; Fri, 18 May 2012 00:28:51 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 May 2012 09:28:51 +0200 Message-ID: Subject: Re: Hama receive queue From: Thomas Jungblut To: hama-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=14dae9cfcba0ff3aed04c04a8006 --14dae9cfcba0ff3aed04c04a8006 Content-Type: text/plain; charset=ISO-8859-1 Cool, I'd be glad to help you on the way ;) Just have a few notes: procId = Integer.parseInt(bspPeer.getPeerName().split(":")[1]); > This is a good observation, but in other modes than the local mode this is a host:port tuple. So your "hack" won't work, but the peerNames array returned by "bspPeer.getAllPeerNames()" is sorted on each task, so you just have to get the index of your peer name. e.G. with: procId = > Arrays.binarySearch(bspPeer.getAllPeerNames(),bspPeer.getPeerName()); > As told in the mail before, I think you will need a row partitioning of the matrix. I made a very simplistic matrix multiplication in BSP [1], if you scroll down, you will see a partitioner based on row number. So your input file (I recommend sequencefiles) have to be as input type. The partitioner will take care of splitting the files accordingly and give it a task. [1] https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/bsp/MatrixMultiplicationBSP.java 2012/5/18 Aditya Sarawgi > Hi, > > The main optimization step is still left, I wanted to be sure that I get > ICF right before moving ahead. And the time complexity of the entire > algorithm is dominated by ICF decomposition. > Will update you guys soon when I have the final implementation done, I > am eager to try it on datasets as well :) > > > On Fri, May 18, 2012 at 1:44 AM, Thomas Jungblut < > thomas.jungblut@googlemail.com> wrote: > > > Thanks for the explanation! > > I have plenty of time today so I can clone your stuff and play arround > with > > it. > > Are there any steps left to use this as SVM? I wanted to try it out on > the > > mushroom set. > > > > 2012/5/18 Aditya Sarawgi > > > > > @Edward its not urgent, I am ready when you are :) > > > > > > @Thomas Thanks for the feedback and help. Sure, you can use the code > > > for the jiras. But do remember it is slightly different from the actual > > icf > > > in the sense > > > that here the dimension of the result matrix would n x p ( where p is > > > typically sqrt(n) ) > > > and the approximation error changes with what p. If p is close to n the > > > error is low. > > > > > > It seems to work on smaller matrices pretty well. I tried it by varying > > the > > > values of p and > > > as p approaches n, the decomposition has less error. > > > I have to do some more testing though. > > > > > > > > > On Thu, May 17, 2012 at 11:06 AM, Thomas Jungblut < > > > thomas.jungblut@googlemail.com> wrote: > > > > > > > instanceof is slow as hell, but if you have no other solution then > this > > > is > > > > okay. > > > > > > > > 2) What is like the standard way to load matrices in different nodes > > > with a > > > > > custom partitioning scheme > > > > > > > > > > > > It is depending on your algorithm needs, but I think you will need to > > > > implement your own partitioner, since HashPartitioning may not apply > to > > > > this ICF. > > > > Generally you need to use the input system to read a part of a matrix > > > into > > > > each peer. > > > > > > > > We also script a mapreduce job that will create random input for x GB > > to > > > > check scalability. > > > > Here is that for graphs: > > https://issues.apache.org/jira/browse/HAMA-558 > > > > But I think this is easily extendable to matrices. There is an issue > > for > > > > that as well, I don't know how far Mikalai came with that. > > > > > > > > BTW your code looks good ;) > > > > > > > > Can we use this for https://issues.apache.org/jira/browse/HAMA-94 or > > > > https://issues.apache.org/jira/browse/HAMA-553 ? Would be a great > > > addition > > > > if it works! > > > > > > > > Greetings from Germany, > > > > Thomas > > > > > > > > 2012/5/17 Aditya Sarawgi > > > > > > > > > Thanks Thomas. > > > > > I am actually using tags for something else. So for now using > > > instanceof > > > > is > > > > > just fine with me. > > > > > > > > > > I had a couple of more questions, regarding benchmarking stuff on > > > hama. I > > > > > have a working implementation of > > > > > Parallel row based icf that given a n x n matrix returns a > > decomposed n > > > > x p > > > > > matrix. > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/truncs/hello-world/blob/master/shttps://issues.apache.org/jira/browse/HAMA-558rc/main/java/edu/sunysb/cs/Icf.java > > > > < > > > > > > > > > > https://github.com/truncs/hello-world/blob/master/src/main/java/edu/sunysb/cs/Icf.java > > > > > > > > > > > > > > > Now I would like to test this on a big input and possibly in full > > > > > distributed mode, so I was wondering how do > > > > > people usually do these sort of benchmarking. > > > > > > > > > > Specifically, > > > > > 1) Do they setup a cluster on AWS ? > > > > > 2) What is like the standard way to load matrices in different > nodes > > > > with a > > > > > custom partitioning scheme > > > > > 3) Is there anything else that I should know > > > > > > > > > > On Thu, May 17, 2012 at 3:20 AM, Thomas Jungblut < > > > > > thomas.jungblut@googlemail.com> wrote: > > > > > > > > > > > Hi Aditya, > > > > > > > > > > > > that's where the concept of Message Tagging comes into play. You > > have > > > > > tags > > > > > > in each message which are hardcoded as Strings. > > > > > > But as Edward told you can use GenericWritable or ObjectWritable > > > > instead, > > > > > > so they will tag your messages with the classnames and give you > the > > > > > correct > > > > > > class. > > > > > > > > > > > > Is there any way by which I can pop from the receive queue ? > > > > > > > > > > > > > > > > > > peer.getCurrentMessage() is popping from the received queue. > > > > > > > > > > > > 2012/5/17 Aditya Sarawgi > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > But thats not the only problem, consider this case > > > > > > > that there are variable number of messages being sent, so I > would > > > > have > > > > > to > > > > > > > maintain > > > > > > > counts for each peer pointing to the last unread message. > > > > > > > > > > > > > > Is there any way by which I can pop from the receive queue ? > > > > > > > > > > > > > > > > > > > > > On Wed, May 16, 2012 at 10:23 PM, Suraj Menon < > > > > surajsmenon@apache.org > > > > > > > >wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > Please take a look at this snippet of code copied and > modified > > > from > > > > > > > > Mapper class to implement your scenario. - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/ssmenon/hama/edit/master/hama-mapreduce/src/org/apache/hama/computemodel/mapreduce/Trials.java > > > > > > > > Between lines 233 to 245 I am able to send different type of > > > > > messages. > > > > > > > > With type checks and generics you shouldn't be encountering > > > > Classcast > > > > > > > > exception at receiving end too. I am yet to test the next > > > > superstep, > > > > > > > > shall update you with sample code for the next superstep > > > mimicking > > > > > > > > your scenario for receiving. > > > > > > > > > > > > > > > > For elegance, we have an experimental Superstep#compute > > > > > > > > API(org.apache.hama.bsp.Superstep). I have encountered an > issue > > > in > > > > > job > > > > > > > > submission framework with this method in distributed mode; > fix > > > for > > > > > > > > this would be pushed to trunk in next few hours. You can > still > > > run > > > > it > > > > > > > > using LocalBSPRunner for now. > > > > > > > > > > > > > > > > -Suraj > > > > > > > > > > > > > > > > On Wed, May 16, 2012 at 9:18 PM, Aditya Sarawgi > > > > > > > > wrote: > > > > > > > > > Hi Edward, > > > > > > > > > > > > > > > > > > Yes that is what I did > > > > > > > > > I wrote an ArrayMessage class (doesn't use generics for now > > but > > > > can > > > > > > be > > > > > > > > > converted easily) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/truncs/hello-world/blob/master/src/main/java/edu/sunysb/cs/ArrayMessage.java > > > > > > > > > > > > > > > > > > But the problem is that I am sending a IntegerMessage > before > > > and > > > > > > after > > > > > > > > > reading the IntegerMessage I am sending > > > > > > > > > an ArrayMessage but the previous IntegerMessage is still > > there. > > > > > > > > > > > > > > > > > > On Wed, May 16, 2012 at 8:34 PM, Edward J. Yoon < > > > > > > edwardyoon@apache.org > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > >> Hi, > > > > > > > > >> > > > > > > > > >> To send or receive multiple Message types, I think you can > > use > > > > > > > > >> GenericWritable. You can also implement your own > > > GenericMessage > > > > > and > > > > > > > > >> contribute it to our project! > > > > > > > > >> > > > > > > > > >> Hope this helps you. > > > > > > > > >> > > > > > > > > >> On Thu, May 17, 2012 at 7:48 AM, Aditya Sarawgi > > > > > > > > >> wrote: > > > > > > > > >> > Hi Guys, > > > > > > > > >> > > > > > > > > > >> > I am wondering how do the receive queues in hama work. > > > > Consider > > > > > > this > > > > > > > > case > > > > > > > > >> > that I want to sent a different type of BSPMessage in 2 > > > > > > consecutive > > > > > > > > >> > superstep. > > > > > > > > >> > In this first superstep I am sending IntMessage and in > the > > > > next > > > > > > one > > > > > > > I > > > > > > > > am > > > > > > > > >> > sending a ArrayMessage ( custom message class). > > > > > > > > >> > > > > > > > > > >> > Now in the second super step when I do a > > > > > > > > >> > while ((arrayMessage = (ArrayMessage) > > > > peer.getCurrentMessage()) > > > > > > != > > > > > > > > >> null) { > > > > > > > > >> > > > > > > > > > >> > it is throwing a java.lang.ClassCastException, which is > > > > obvious > > > > > > > since > > > > > > > > its > > > > > > > > >> > trying to cast IntMessage to ArrayMessage. > > > > > > > > >> > I thought the message is dropped from the queue after it > > is > > > > > read, > > > > > > is > > > > > > > > this > > > > > > > > >> > not the case ? > > > > > > > > >> > And if it is not, how can this be handled elegantly ? > > > > > > > > >> > > > > > > > > > >> > -- > > > > > > > > >> > Cheers, > > > > > > > > >> > Aditya Sarawgi > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> -- > > > > > > > > >> Best Regards, Edward J. Yoon > > > > > > > > >> @eddieyoon > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Cheers, > > > > > > > > > Aditya Sarawgi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Cheers, > > > > > > > Aditya Sarawgi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Thomas Jungblut > > > > > > Berlin > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Cheers, > > > > > Aditya Sarawgi > > > > > > > > > > > > > > > > > > > > > -- > > > > Thomas Jungblut > > > > Berlin > > > > > > > > > > > > > > > > -- > > > Cheers, > > > Aditya Sarawgi > > > > > > > > > > > -- > > Thomas Jungblut > > Berlin > > > > > > -- > Cheers, > Aditya Sarawgi > -- Thomas Jungblut Berlin --14dae9cfcba0ff3aed04c04a8006--