giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Peters <>
Subject Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph
Date Thu, 29 Aug 2013 23:53:24 GMT
Avery, it would seem that optimizations to Giraph have, unfortunately,
turned the majority of the heap into "dark matter". The two snapshots are
at unknown points in a superstep but I waited for several supersteps so
that the activity had more or less stabilized. About the only thing
comparable between the two snapshots are the vertexes, 192561 X
"RecsVertex" in the new version and 191995 X "Coloring" in the old system.
But with the new Giraph 672710176 out of 824886184 bytes are stored as
primitive byte arrays. That's probably indicative of some very fine
performance optimization work, but it makes it extremely difficult to know
what's really out there, and why. I did notice that a number of caches have
appeared that did not exist before,
namely SendEdgeCache, SendPartitionCache, SendMessageCache
and SendMutationsCache.

Could any of those account for a larger per-worker footprint in a modern
Giraph? Should I simply assume that I need to force AWS to configure its
EMR Hadoop so that each instance has fewer map tasks but with a somewhat
larger VM max, say 3GB instead of 2GB?

On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching <> wrote:

> Try dumping a histogram of memory usage from a running JVM and see where
> the memory is going.  I can't think of anything in particular that
> changed...
> On 8/28/13 4:39 PM, Jeff Peters wrote:
>> I am tasked with updating our ancient (circa 7/10/2012) Giraph to
>> giraph-release-1.0.0-RC3. Most jobs run fine but our largest job now runs
>> out of memory using the same AWS elastic-mapreduce configuration we have
>> always used. I have never tried to configure either Giraph or the AWS
>> Hadoop. We build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS
>> provides us. The 8 X m2.4xlarge cluster we use seems to provide 8*14=112
>> map tasks fitted out with 2GB heap each. Our code is completely unchanged
>> except as required to adapt to the new Giraph APIs. Our vertex, edge, and
>> message data are completely unchanged. On smaller jobs, that work, the
>> aggregate heap usage high-water mark seems about the same as before, but
>> the "committed heap" seems to run higher. I can't even make it work on a
>> cluster of 12. In that case I get one map task that seems to end up with
>> nearly twice as many messages as most of the others so it runs out of
>> memory anyway. It only takes one to fail the job. Am I missing something
>> here? Should I be configuring my new Giraph in some way I didn't used to
>> need to with the old one?

View raw message