Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AE3C97E42 for ; Tue, 6 Sep 2011 15:50:17 +0000 (UTC) Received: (qmail 23533 invoked by uid 500); 6 Sep 2011 15:50:17 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 23506 invoked by uid 500); 6 Sep 2011 15:50:17 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 23498 invoked by uid 99); 6 Sep 2011 15:50:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2011 15:50:17 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 216.145.54.171 is neither permitted nor denied by domain of aching@yahoo-inc.com) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2011 15:50:10 +0000 Received: from SP2-EX07CAS01.ds.corp.yahoo.com (sp2-ex07cas01.corp.sp2.yahoo.com [98.137.59.37]) by mrout1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p86Fnf46024661 for ; Tue, 6 Sep 2011 08:49:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1315324181; bh=rNLZ6GTe+1eq32txufObf5e/jE3Mr6dzht+qfJH7V1Y=; h=From:To:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=nBYTeMPkQEhD67V2rjUwwzHlkoBLBBNOTcTQmkHUTUprm28TOR0N0qgWNUFRobxgz d57mAmVuhEqhUB94NSx+lrg1IxtxJNwHgFDNqPtr6mz9EzpTgNrc93b0ZtADgRFNgd koPQ4CLbMg9O0iBBNcefEaYT36K1ff1DQb62PEuc= Received: from SP2-EX07VS02.ds.corp.yahoo.com ([98.137.59.30]) by SP2-EX07CAS01.ds.corp.yahoo.com ([98.137.59.37]) with mapi; Tue, 6 Sep 2011 08:49:41 -0700 From: Avery Ching To: "giraph-user@incubator.apache.org" Date: Tue, 6 Sep 2011 08:49:39 -0700 Subject: Re: Test message Thread-Topic: Test message Thread-Index: AcxsrJaR0CykU7mDStyzi8Lq/LpyKg== Message-ID: <36FB97CF-0C1D-4EB1-A3CA-BC398063B0F4@yahoo-inc.com> References: <67202FFC-54A2-47B9-9F61-83BB4A301366@yahoo-inc.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_36FB97CF0C1D4EB1A3CABC398063B0F4yahooinccom_" MIME-Version: 1.0 --_000_36FB97CF0C1D4EB1A3CABC398063B0F4yahooinccom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Answers are inlined. No vacation for you this weekend I guess =3D). On Sep 6, 2011, at 2:14 AM, Jake Mannix wrote: Hi Avery, Thanks for the quick response! On Mon, Sep 5, 2011 at 11:39 PM, Avery Ching > wrote: Hi Jake, Giraph currently uses a lot of memory, but we're working on it in a few JIR= As. That being said, there are a few things that you can do to get some fa= irly large data sets going. Which JIRAs? https://issues.apache.org/jira/browse/GIRAPH-11 - Balancing memory consumpt= ion among workers https://issues.apache.org/jira/browse/GIRAPH-12 - Communication threads con= suming too much memory If you have a 64-bit JVM for your task trackers, that is much better, other= wise you are limited to 4 GB (like me). Limiting each mapper to 4GB is fine, because in theory, most clusters run= with totalRAMperBox / numCores < 4GB anyways (certainly true for our clust= er). What happens in Giraph when multiple mappers are on the same physical box, = do they still communicate via RPC? Currently, yes. I was able to run the org.apache.giraph.benchmark.PageRankBenchmark with 30= 0 workers and 1 billion vertices with a single edge and 20 supersteps. Her= e's the parameters I used for our configuration: 1B vertices with just *one* edge? What kind of graph would this be??? Very lame, I agree. Just for numbers =3D). Once some of the memory issues= are worked out, we'll test more connected graphs. The PageRankBenchMark code runs with synthetic graph data it generates on t= he fly, right? Yes. I'm having it read the graph data from HDFS, where I can see how big it is = on disk, into RAM, by subclassing SimplePageRankVertex. So my graph may be= a bit poorly balanced (I'll add some logging to see). hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmar= k.PageRankBenchmark -Dgiraph.totalInputSplitMultiplier=3D0.001 -Dmapred.chi= ld.java.opts=3D"-Xms1800m -Xmx1800m -X ss64k" -Dmapred.job.map.memory.mb=3D4608 -Dgiraph.checkpointFrequency=3D0 -= Dgiraph.pollAttempts=3D20 -e 1 -s 20 -v -V 1000000000 -w 300 Your parameters will likely vary based on how much memory you have and your= Hadoop configuration. Our machines have 16 GB I think, but I only have 4 = GB due to the 32-bit limit. Using mapred.job.map.memory.mb allows me to st= eal more map slots per node to give me more memory per map slot. -Xss to r= educe the thread stack size will help a LOT. I'll try to see if -Xss64k helps, thanks. We typically run with 3GB heap p= er mapper, but they're beefy machines, so this is really what everyone gets= (a bit overkill, probably, but we have some folk running pretty memory int= ensive tasks...) Beefy is good, one thing though is that currently we create an equal number= of threads to workers, therefore, if we have n workers, we create n thread= s per worker (hence GIRAPH-12). So we can't use all the memory for heap, h= ave to save some for the threads as well for now. Another thing that could cause memory issues is an imbalance in the input d= ata across the input splits (until JIRA https://issues.apache.org/jira/brow= se/GIRAPH-11 is resolved). Hopefully each input split is fairly balanced f= or now, otherwise, you might want to rebalance the input splits for now. That JIRA ticket seems to be talking about sorted inputs - is this a requir= ement (that vertex ids are sorted in each split), or does this just make so= me things run much better? What does this have to do with balanced input s= plits? Yes, currently inputs must be sorted (requirement) until GIRAPH-12 is finis= hed. Balancing input splits (memory consumed per input split) will help ke= ep the amount of memory similar on all workers and assuming a homogenous wo= rker set, this will allow for larger data sets to be fit into memory. We haven't investigated memory improvements using primitives versus objects= , I'm curious myself to see how much extra memory we are using at the cost = of flexibility. That being said, I think that flexibility is pretty import= ant for users and I'm not sure how to maintain both choices nicely. In Mahout, we've had to spend a fair amount of time early on to trim down a= ll of our java objects, and live in a world where a lot of the time, all we= have are arrays of primitives. It's helped quite a bit with performance, = but it's not really that limiting, actually, as long as you follow the "one= additional layer of indirection" tactic: translate all of your static stat= e into "ids" of some sort (ie normalize your data), and things like Strings= get turned into int termIds, ditto for various other Feature objects. It = does require keeping track of a dictionary at the end of the day, to transl= ate all of your internal ids into User objects, or Documents, etc. But thi= s is what is done in Lucene and databases anyways. I guess I'm not sure whether you *need* to give up the OO API, it's really = nice to have completely strongly typed graph primitives. I'll just see if = I can implement the same API using internal data structures which are nice = and compact (as a performance improvement only) which in the case where all= of the type parameters are primitive-able. So for example, something like= PrimitiveVertex implements MutableVertex. Still API-compatible, but internally takes advantage of the= fact that all types are wrapped primitives. Interesting idea, would like to see the results of this experiment. I'm glad to hear you're trying out Giraph at Twitter. Please keep us aware= of any problems you run into and we'll try to help. Definitely, thanks. We've got some relatively big graphs, I'd be happy to = report our "stress-testing" of this project. :) =3D) -jake Thanks, Avery On Sep 5, 2011, at 10:49 PM, Jake Mannix wrote: > Greetings Giraphians! > > I'm trying out some some simple pagerank tests of Giraph on our cluster= here at Twitter, and I'm wondering what the data-size blow-up is usually e= xpected to be for the on-disk to in-memory graph representation. I tried r= unning a pretty tiny (a single part-file, 2GB big, which had 8 splits) Sequ= enceFile of my own binary data (if you're curious, it's a Mahout SequenceFi= le), which stores the data pretty minimally - = on-disk primitive int "vertex id", target vertex id also just an int, and = the edges have only an 8byte double as payload. > > But we've got 3GB of RAM for our mappers, and some of my 8 workers are = running out of memory. Even if the *entire* part file was in one split, it= 's only 2GB on disk, so I'm wondering how much attention has been paid to m= emory usage in the abstract base class org.apache.giraph.graph.Vertex? It = looks like, on account of being very flexible in terms of types for the ver= tices and edges, keeping a big TreeMap means each int-double pair (dest ver= tex id + edge weight) is getting turned into a bunch of java objects, and t= his is where the blow-up is coming from? > > I wonder if a few special purpose java primitive MutableVertex implemen= tations would be useful for me to contribute to conserve a bit of memory? = If I'm mistaken in my assumptions here (or there is already work done on th= is), just let me know. But if not, I'd love to help get Giraph running on = some nice beefy data sets (with simplistic data models: vertex ids being si= mply ints / longs, and edge weights and messages to pass being similarly ju= st booleans, floats, or doubles), because I've got some stuff I'd love to t= hrow in memory and crank some distributed computations on. :) > > - jake / @pbrane --_000_36FB97CF0C1D4EB1A3CABC398063B0F4yahooinccom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Answers are inlined. =  No vacation for you this weekend I guess  =3D).

On Sep 6, 2011, at 2:14 AM, Jake Mannix wrote:

Hi Avery,

  Thank= s for the quick response!

On Mon, Sep 5, = 2011 at 11:39 PM, Avery Ching <aching@yahoo-inc.com> wrote:
Hi Jake,

Giraph currently uses a lot of memory, but we're working on it in a few JIR= As.  That being said, there are a few things that you can do to get so= me fairly large data sets going.

 = Which JIRAs?
 

https://issues.apache.org/jira/bro= wse/GIRAPH-11 - Balancing memory consumption among workers
https://issu= es.apache.org/jira/browse/GIRAPH-12 - Communication threads consum= ing too much memory
If you have a 64-bit JVM for your task trackers, that is much better, other= wise you are limited to 4 GB (like me).

  Limiting each mapper to 4GB is fine, because in theory, most cluste= rs run with totalRAMperBox / numCores < 4GB anyways (certainly true for = our cluster).
What happens in Giraph when multiple mappers are on the same physical = box, do they still communicate via RPC? 


Currently, yes.  

I was able to run the org.apache.giraph.benchmark.PageRankBenchmark with 30= 0 workers and 1 billion vertices with a single edge and 20 supersteps. &nbs= p;Here's the parameters I used for our configuration:

1B vertices with just *one* edge?  What kind of graph w= ould this be???


= Very lame, I agree.  Just for numbers =3D).  Once some of the mem= ory issues are worked out, we'll test more connected graphs.

The PageRankBenchMark = code runs with synthetic graph data it generates on the fly, right?  <= /div>

Yes.


I'm having it read the graph data from HDFS, where I can see how big i= t is on disk, into RAM, by subclassing SimplePageRankVertex.  So my gr= aph may be a bit poorly balanced (I'll add some logging to see). 

hadoop jar giraph-0.70-jar-w= ith-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph= .totalInputSplitMultiplier=3D0.001 -Dmapred.child.java.opts=3D"-Xms1800m -X= mx1800m -X
ss64k" -Dmapred.job.map.memory.mb=3D4608 -Dgiraph.checkpointFrequency=3D0 -= Dgiraph.pollAttempts=3D20 -e 1 -s 20 -v -V 1000000000 -w 300

Your parameters will likely vary based on how much memory you have and your= Hadoop configuration.  Our machines have 16 GB I think, but I only ha= ve 4 GB due to the 32-bit limit.  Using mapred.job.map.memory.mb allow= s me to steal more map slots per node to give me more memory per map slot. =  -Xss to reduce the thread stack size will help a LOT.

I'll try to see if -Xss64k helps, thanks. =  We typically run with 3GB heap per mapper, but they're beefy machines= , so this is really what everyone gets (a bit overkill, probably, but we ha= ve some folk running pretty memory intensive tasks...)
 

Beefy is good, one = thing though is that currently we create an equal number of threads to work= ers, therefore, if we have n workers, we create n threads per worker (hence= GIRAPH-12).  So we can't use all the memory for heap, have to save so= me for the threads as well for now.

Another thing that could cause memory issues is an imbalance in the input d= ata across the input splits (until JIRA https://issues.apache.org/jira/b= rowse/GIRAPH-11 is resolved).  Hopefully each input split is fairl= y balanced for now, otherwise, you might want to rebalance the input splits= for now.

That JIRA ticket seems to be talking about= sorted inputs - is this a requirement (that vertex ids are sorted in each = split), or does this just make some things run much better?  What does= this have to do with balanced input splits? 
 

Yes, currently inpu= ts must be sorted (requirement) until GIRAPH-12 is finished.  Balancin= g input splits (memory consumed per input split) will help keep the amount = of memory similar on all workers and assuming a homogenous worker set, this= will allow for larger data sets to be fit into memory.

We haven't investigated memory improvements using primitives versus objects= , I'm curious myself to see how much extra memory we are using at the cost = of flexibility.  That being said, I think that flexibility is pretty i= mportant for users and I'm not sure how to maintain both choices nicely.

In Mahout, we've had to spend a fair amoun= t of time early on to trim down all of our java objects, and live in a worl= d where a lot of the time, all we have are arrays of primitives.  It's= helped quite a bit with performance, but it's not really that limiting, ac= tually, as long as you follow the "one additional layer of indirection" tac= tic: translate all of your static state into "ids" of some sort (ie normali= ze your data), and things like Strings get turned into int termIds, ditto f= or various other Feature objects.  It does require keeping track of a = dictionary at the end of the day, to translate all of your internal ids int= o User objects, or Documents, etc.  But this is what is done in Lucene= and databases anyways.

I guess I'm not sure whether you *need* to give up the = OO API, it's really nice to have completely strongly typed graph primitives= .  I'll just see if I can implement the same API using internal data s= tructures which are nice and compact (as a performance improvement only) wh= ich in the case where all of the type parameters are primitive-able.  = So for example, something like PrimitiveVertex<LongWritable, DoubleWrita= ble, FloatWritable, DoubleWritable> implements MutableVertex<LongWrit= able, DoubleWritable, FloatWritable, DoubleWritable>.  Still API-co= mpatible, but internally takes advantage of the fact that all types are wra= pped primitives.
 

Interesting idea, w= ould like to see the results of this experiment.

I'm glad to hear you're trying out Giraph at Twitter.  Please keep us = aware of any problems you run into and we'll try to help.
<= div>
Definitely, thanks.  We've got some relatively big = graphs, I'd be happy to report our "stress-testing" of this project. :)


=3D)

  -jake
&= nbsp;

Thanks,

Avery

On Sep 5, 2011, at 10:49 PM, Jake Mannix wrote:

> Greetings Giraphians!
>
>   I'm trying out some some simple pagerank tests of Giraph on our= cluster here at Twitter, and I'm wondering what the data-size blow-up is u= sually expected to be for the on-disk to in-memory graph representation. &n= bsp;I tried running a pretty tiny (a single part-file, 2GB big, which had 8= splits) SequenceFile of my own binary data (if you're curious, it's a Maho= ut SequenceFile<IntWritable, VectorWritable>), which stores the data = pretty minimally - on-disk primitive int "vertex id",  target vertex i= d also just an int, and the edges have only an 8byte double as payload.
>
>   But we've got 3GB of RAM for our mappers, and some of my 8 work= ers are running out of memory.  Even if the *entire* part file was in = one split, it's only 2GB on disk, so I'm wondering how much attention has b= een paid to memory usage in the abstract base class org.apache.giraph.graph= .Vertex?  It looks like, on account of being very flexible in terms of= types for the vertices and edges, keeping a big TreeMap means each int-dou= ble pair (dest vertex id + edge weight) is getting turned into a bunch of j= ava objects, and this is where the blow-up is coming from?
>
>   I wonder if a few special purpose java primitive MutableVertex = implementations would be useful for me to contribute to conserve a bit of m= emory?  If I'm mistaken in my assumptions here (or there is already wo= rk done on this), just let me know.  But if not, I'd love to help get = Giraph running on some nice beefy data sets (with simplistic data models: v= ertex ids being simply ints / longs, and edge weights and messages to pass = being similarly just booleans, floats, or doubles), because I've got some s= tuff I'd love to throw in memory and crank some distributed computations on= . :)
>
>   - jake / @pbrane



= --_000_36FB97CF0C1D4EB1A3CABC398063B0F4yahooinccom_--