Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
Message-ID: <522130B5.1010906@apache.org>
Date: Fri, 30 Aug 2013 16:54:29 -0700
From: Avery Ching <aching@apache.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8;
 rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: user@giraph.apache.org
Subject: Re: Out of memory with giraph-release-1.0.0-RC3, used to work on
 old Giraph
References: 
 <CAO6K=gvnzmOkh4=EyhSP6sYj8SsiARH7PMD8xZSp7Pd7eAdCAQ@mail.gmail.com>
 <521E8E60.9080902@apache.org>
 <CAO6K=gt0tWOKxDFMUqGnHQREtL4hRr8Zz_kY9s8USD-uYxXT-w@mail.gmail.com>
In-Reply-To: 
 <CAO6K=gt0tWOKxDFMUqGnHQREtL4hRr8Zz_kY9s8USD-uYxXT-w@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------050006040903040506050506"

This is a multi-part message in MIME format.
--------------050006040903040506050506
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Ah, the new caches. =)  These make things a lot faster (bulk data 
sending), but do take up some additional memory.  if you look at 
GiraphConstants, you can find ways to change the cache sizes (this will 
reduce that memory usage).
For example, MAX_EDGE_REQUEST_SIZE will affect the size of the edge 
cache.  MAX_MSG_REQUEST_SIZE will affect the size of the message cache.  
The caches are per worker, so 100 workers would require 50 MB  per 
worker by default.  Feel free to trim it if you like.

The byte arrays for the edges are the most efficient storage possible 
(although not as performance as the native edge stores).

Hope that helps,

Avery

On 8/29/13 4:53 PM, Jeff Peters wrote:
> Avery, it would seem that optimizations to Giraph have, unfortunately, 
> turned the majority of the heap into "dark matter". The two snapshots 
> are at unknown points in a superstep but I waited for several 
> supersteps so that the activity had more or less stabilized. About the 
> only thing comparable between the two snapshots are the vertexes, 
> 192561 X "RecsVertex" in the new version and 191995 X "Coloring" in 
> the old system. But with the new Giraph 672710176 out of 824886184 
> bytes are stored as primitive byte arrays. That's probably indicative 
> of some very fine performance optimization work, but it makes it 
> extremely difficult to know what's really out there, and why. I did 
> notice that a number of caches have appeared that did not exist 
> before, namely SendEdgeCache, SendPartitionCache, SendMessageCache 
> and SendMutationsCache.
>
> Could any of those account for a larger per-worker footprint in a 
> modern Giraph? Should I simply assume that I need to force AWS to 
> configure its EMR Hadoop so that each instance has fewer map tasks but 
> with a somewhat larger VM max, say 3GB instead of 2GB?
>
>
> On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching <aching@apache.org 
> <mailto:aching@apache.org>> wrote:
>
>     Try dumping a histogram of memory usage from a running JVM and see
>     where the memory is going.  I can't think of anything in
>     particular that changed...
>
>
>     On 8/28/13 4:39 PM, Jeff Peters wrote:
>
>
>         I am tasked with updating our ancient (circa 7/10/2012) Giraph
>         to giraph-release-1.0.0-RC3. Most jobs run fine but our
>         largest job now runs out of memory using the same AWS
>         elastic-mapreduce configuration we have always used. I have
>         never tried to configure either Giraph or the AWS Hadoop. We
>         build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS
>         provides us. The 8 X m2.4xlarge cluster we use seems to
>         provide 8*14=112 map tasks fitted out with 2GB heap each. Our
>         code is completely unchanged except as required to adapt to
>         the new Giraph APIs. Our vertex, edge, and message data are
>         completely unchanged. On smaller jobs, that work, the
>         aggregate heap usage high-water mark seems about the same as
>         before, but the "committed heap" seems to run higher. I can't
>         even make it work on a cluster of 12. In that case I get one
>         map task that seems to end up with nearly twice as many
>         messages as most of the others so it runs out of memory
>         anyway. It only takes one to fail the job. Am I missing
>         something here? Should I be configuring my new Giraph in some
>         way I didn't used to need to with the old one?
>
>
>


--------------050006040903040506050506
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Ah, the new caches. =)&nbsp; These make
      things a lot faster (bulk data sending), but do take up some
      additional memory.&nbsp; if you look at GiraphConstants, you can find
      ways to change the cache sizes (this will reduce that memory
      usage).&nbsp; <br>
      For example, MAX_EDGE_REQUEST_SIZE will affect the size of the
      edge cache.&nbsp; MAX_MSG_REQUEST_SIZE will affect the size of the
      message cache.&nbsp; The caches are per worker, so 100 workers would
      require 50 MB&nbsp; per worker by default.&nbsp; Feel free to trim it if you
      like.<br>
      <br>
      The byte arrays for the edges are the most efficient storage
      possible (although not as performance as the native edge stores).&nbsp;
      <br>
      <br>
      Hope that helps,<br>
      <br>
      Avery<br>
      <br>
      On 8/29/13 4:53 PM, Jeff Peters wrote:<br>
    </div>
    <blockquote
cite="mid:CAO6K=gt0tWOKxDFMUqGnHQREtL4hRr8Zz_kY9s8USD-uYxXT-w@mail.gmail.com"
      type="cite">
      <div dir="ltr">Avery, it would seem that optimizations to Giraph
        have, unfortunately, turned the majority of the heap into "dark
        matter". The two snapshots are at unknown points in a superstep
        but I waited for several supersteps so that the activity had
        more or less stabilized. About the only thing comparable between
        the two snapshots are the vertexes, 192561 X "RecsVertex" in the
        new version and 191995 X "Coloring" in the old system. But with
        the new Giraph&nbsp;672710176 out of&nbsp;824886184 bytes are stored as
        primitive byte arrays. That's probably indicative of some very
        fine performance optimization work, but it makes it extremely
        difficult to know what's really out there, and why. I did notice
        that a number of caches have appeared that did not exist before,
        namely&nbsp;SendEdgeCache,&nbsp;SendPartitionCache,&nbsp;SendMessageCache
        and&nbsp;SendMutationsCache.
        <div>
          <br>
        </div>
        <div>Could any of those account for a larger per-worker
          footprint in a modern Giraph? Should I simply assume that I
          need to force AWS to configure its EMR Hadoop so that each
          instance has fewer map tasks but with a somewhat larger VM
          max, say 3GB instead of 2GB?</div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Wed, Aug 28, 2013 at 4:57 PM, Avery
          Ching <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:aching@apache.org" target="_blank">aching@apache.org</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">Try
            dumping a histogram of memory usage from a running JVM and
            see where the memory is going. &nbsp;I can't think of anything in
            particular that changed...
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                On 8/28/13 4:39 PM, Jeff Peters wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <br>
                  I am tasked with updating our ancient (circa
                  7/10/2012) Giraph to giraph-release-1.0.0-RC3. Most
                  jobs run fine but our largest job now runs out of
                  memory using the same AWS elastic-mapreduce
                  configuration we have always used. I have never tried
                  to configure either Giraph or the AWS Hadoop. We build
                  for Hadoop 1.0.2 because that's closest to the 1.0.3
                  AWS provides us. The 8 X m2.4xlarge cluster we use
                  seems to provide 8*14=112 map tasks fitted out with
                  2GB heap each. Our code is completely unchanged except
                  as required to adapt to the new Giraph APIs. Our
                  vertex, edge, and message data are completely
                  unchanged. On smaller jobs, that work, the aggregate
                  heap usage high-water mark seems about the same as
                  before, but the "committed heap" seems to run higher.
                  I can't even make it work on a cluster of 12. In that
                  case I get one map task that seems to end up with
                  nearly twice as many messages as most of the others so
                  it runs out of memory anyway. It only takes one to
                  fail the job. Am I missing something here? Should I be
                  configuring my new Giraph in some way I didn't used to
                  need to with the old one?<br>
                  <br>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------050006040903040506050506--