Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9348910425 for ; Tue, 10 Sep 2013 07:51:49 +0000 (UTC) Received: (qmail 25118 invoked by uid 500); 10 Sep 2013 07:51:49 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 25073 invoked by uid 500); 10 Sep 2013 07:51:49 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 25065 invoked by uid 99); 10 Sep 2013 07:51:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Sep 2013 07:51:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of claudio.martella@gmail.com designates 209.85.212.53 as permitted sender) Received: from [209.85.212.53] (HELO mail-vb0-f53.google.com) (209.85.212.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Sep 2013 07:51:42 +0000 Received: by mail-vb0-f53.google.com with SMTP id i3so4819649vbh.12 for ; Tue, 10 Sep 2013 00:51:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=ctAJQyWgbQEAg4w2QRjpQgJDBQKklJkNhLoPmcjh5cU=; b=PTWDI1m0S0SYIgq8Sc0Ubrj+dqiP0F94Jll8jW9d5KclIU72bOHuHxX0NWfZMTUROQ rdtRt0UBMChmkKStiQXIObVcZBVSAkvnmLhMdW5cSTcJ9nNmCT/hWdfwgff3IPc6R3cn vNip70TUfg1CC1yrFEOTHvsbEMyG3vAbghTIadiXsJ+oiY2E/j5fnFU42r8yWIBr3XXi AZni/W6h1sWyE+ZKXWrtH69D26abJ21StSNdTAyOVdZKmPB1+OVKY8jRXj2Sq1fqq4T8 zsb03ZWh3FTaCgm23DJ18e1+KgpzxMiiEAVFV3ADNHWmzPh8H42+h4D40cb+Nwb9Mkpd B6FQ== X-Received: by 10.52.230.102 with SMTP id sx6mr18047909vdc.15.1378799481339; Tue, 10 Sep 2013 00:51:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.141.206 with HTTP; Tue, 10 Sep 2013 00:51:01 -0700 (PDT) In-Reply-To: <522E7413.70603@data-tactics-corp.com> References: <522E7413.70603@data-tactics-corp.com> From: Claudio Martella Date: Tue, 10 Sep 2013 09:51:01 +0200 Message-ID: Subject: Re: Out of core execution has no effect on GC crash To: "user@giraph.apache.org" Content-Type: multipart/alternative; boundary=089e0111ae344d467804e602c539 X-Virus-Checked: Checked by ClamAV on apache.org --089e0111ae344d467804e602c539 Content-Type: text/plain; charset=ISO-8859-1 As David mentions, even with OOC, the objects are still created (and yes, often soon destroyed after spilled to disk) putting pressure on the GC. Moreover, with the increase in size of the graph, the number of in-memory vertices is not the only increasing chunk of memory, as there are other memory stores around the codebase that get filled, such as caches etc. Try increasing the heap to something reasonable for your machines. On Tue, Sep 10, 2013 at 3:21 AM, David Boyd wrote: > Alexander: > You might try turning off the GC Overhead limit > (-XX:-UseGCOverheadLimit) > Also you could turn on verbose GC logging (-verbose:gc > -Xloggc:/tmp/@taskid@.gc) > to see what is happening. > Because the OOC still has to create and destroy objects I suspect that the > heap is just > getting really fragmented. > > There are options that you can set with Java to change the type of garbage > collection and > how it is scheduled as well. > > You might up the heap size slightly - what is the default heap size on > your cluster? > > > On 9/9/2013 8:33 PM, Alexander Asplund wrote: > >> A small note: I'm not seeing any partitions directory being formed >> under _bsp, which is where I have understood that they should be >> appearing. >> >> On 9/10/13, Alexander Asplund wrote: >> >>> Really appreciate the swift responses! Thanks again. >>> >>> I have not both increased mapper tasks and decreased max number of >>> partitions at the same time. I first did tests with increased Mapper >>> heap available, but reset the setting after it apparently caused >>> other, large volume, non-Giraph jobs to crash nodes when reducers also >>> were running. >>> >>> I'm curious why increasing mapper heap is a requirement. Shouldn't the >>> OOC mode be able to work with the amount of heap that is available? Is >>> there some agreement on the minimum amount of heap necessary for OOC >>> to succeed, to guide the choice of Mapper heap amount? >>> >>> Either way, I will try increasing mapper heap again as much as >>> possible, which hopefully will run. >>> >>> On 9/9/13, Claudio Martella wrote: >>> >>>> did you extend the heap available to the mapper tasks? e.g. through >>>> mapred.child.java.opts. >>>> >>>> >>>> On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund >>>> wrote: >>>> >>>> Thanks for the reply. >>>>> >>>>> I tried setting giraph.maxPartitionsInMemory to 1, but I'm still >>>>> getting OOM: GC limit exceeded. >>>>> >>>>> Are there any particular cases the OOC will not be able to handle, or >>>>> is it supposed to work in all cases? If the latter, it might be that I >>>>> have made some configuration error. >>>>> >>>>> I do have one concern that might indicateI have done something wrong: >>>>> to allow OOC to activate without crashing I had to modify the trunk >>>>> code. This was because Giraph relied on guava-12 and >>>>> DiskBackedPartitionStore used hasInt() - a method which does not exist >>>>> in guava-11 which hadoop 2 depends on. At runtime guava 11 was being >>>>> used >>>>> >>>>> I suppose this problem might indicate I'm running submitting the job >>>>> using the wrong binary. Currently I am including the giraph >>>>> dependencies with the jar, and running using hadoop jar. >>>>> >>>>> On 9/7/13, Claudio Martella wrote: >>>>> >>>>>> OOC is used also at input superstep. try to decrease the number of >>>>>> partitions kept in memory. >>>>>> >>>>>> >>>>>> On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund >>>>>> wrote: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> I'm trying to process a graph that is about 3 times the size of >>>>>>> available memory. On the other hand, there is plenty of disk space. I >>>>>>> have enabled the giraph.useOutOfCoreGraph property, but it still >>>>>>> crashes with outOfMemoryError: GC limit exceeded when I try running >>>>>>> my >>>>>>> job. >>>>>>> >>>>>>> I'm wondering of the spilling is supposed to work during the input >>>>>>> step. If so, are there any additional steps that must be taken to >>>>>>> ensure it functions? >>>>>>> >>>>>>> Regards, >>>>>>> Alexander Asplund >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Claudio Martella >>>>>> claudio.martella@gmail.com >>>>>> >>>>>> >>>>> -- >>>>> Alexander Asplund >>>>> >>>>> >>>> >>>> -- >>>> Claudio Martella >>>> claudio.martella@gmail.com >>>> >>>> >>> -- >>> Alexander Asplund >>> >>> >> > > -- > ========= mailto:dboyd@data-tactics.com ============ > David W. Boyd > Director, Engineering > 7901 Jones Branch, Suite 700 > Mclean, VA 22102 > office: +1-571-279-2122 > fax: +1-703-506-6703 > cell: +1-703-402-7908 > ============== http://www.data-tactics.com.**com/============ > First Robotic Mentor - FRC, FTC - www.iliterobotics.org > President - USSTEM Foundation - www.usstem.org > > The information contained in this message may be privileged > and/or confidential and protected from disclosure. > If the reader of this message is not the intended recipient > or an employee or agent responsible for delivering this message > to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication > is strictly prohibited. If you have received this communication > in error, please notify the sender immediately by replying to > this message and deleting the material from any computer. > > > -- Claudio Martella claudio.martella@gmail.com --089e0111ae344d467804e602c539 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
As David mentions, even with OOC, the objects are still cr= eated (and yes, often soon destroyed after spilled to disk) putting pressur= e on the GC. Moreover, with the increase in size of the graph, the number o= f in-memory vertices is not the only increasing chunk of memory, as there a= re other memory stores around the codebase that get filled, such as caches = etc.

Try increasing the heap to something reasonable for your mac= hines.


On Tue, Sep 10, 2013 at 3:21 AM, David Boyd <dboyd@data-tac= tics-corp.com> wrote:
Alexander:
=A0 =A0 You might try turning off the GC Overhead limit (-XX:-UseGCOverhead= Limit)
Also you could turn on verbose GC logging (-verbose:gc -Xloggc:/tmp/@taskid= @.gc)
to see what is happening.
Because the OOC still has to create and destroy objects I suspect that the = heap is just
getting really fragmented.

There are options that you can set with Java to change the type of garbage = collection and
how it is scheduled as well.

You might up the heap size slightly - what is the default heap size on your= cluster?


On 9/9/2013 8:33 PM, Alexander Asplund wrote:
A small note: I'm not seeing any partitions directory being formed
under _bsp, which is where I have understood that they should be
appearing.

On 9/10/13, Alexander Asplund <alexasplund@gmail.com> wrote:
Really appreciate the swift responses! Thanks again.

I have not both increased mapper tasks and decreased max number of
partitions at the same time. I first did tests with increased Mapper
heap available, but reset the setting after it apparently caused
other, large volume, non-Giraph jobs to crash nodes when reducers also
were running.

I'm curious why increasing mapper heap is a requirement. Shouldn't = the
OOC mode be able to work with the amount of heap that is available? Is
there some agreement on the minimum amount of heap necessary for OOC
to succeed, to guide the choice of Mapper heap amount?

Either way, I will try increasing mapper heap again as much as
possible, which hopefully will run.

On 9/9/13, Claudio Martella <claudio.martella@gmail.com> wrote:
did you extend the heap available to the mapper tasks? e.g. through
mapred.child.java.opts.


On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund
<alexasplund@= gmail.com>wrote:

Thanks for the reply.

I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
getting OOM: GC limit exceeded.

Are there any particular cases the OOC will not be able to handle, or
is it supposed to work in all cases? If the latter, it might be that I
have made some configuration error.

I do have one concern that might indicateI have done something wrong:
to allow OOC to activate without crashing I had to modify the trunk
code. This was because Giraph relied on guava-12 and
DiskBackedPartitionStore used hasInt() - a method which does not exist
in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
used

I suppose this problem might indicate I'm running submitting the job using the wrong binary. Currently I am including the giraph
dependencies with the jar, and running using hadoop jar.

On 9/7/13, Claudio Martella <claudio.martella@gmail.com> wrote:
OOC is used also at input superstep. try to decrease the number of
partitions kept in memory.


On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
<alexasplund@= gmail.com>wrote:

Hi,

I'm trying to process a graph that is about 3 times the size of
available memory. On the other hand, there is plenty of disk space. I
have enabled the giraph.useOutOfCoreGraph property, but it still
crashes with outOfMemoryError: GC limit exceeded when I try running
my
job.

I'm wondering of the spilling is supposed to work during the input
step. If so, are there any additional steps that must be taken to
ensure it functions?

Regards,
Alexander Asplund



--
=A0 =A0 Claudio Martella
=A0 =A0 cla= udio.martella@gmail.com


--
Alexander Asplund



--
=A0 =A0 Claudio Martella
=A0 =A0 cla= udio.martella@gmail.com


--
Alexander Asplund




--
=3D=3D=3D=3D=3D=3D=3D=3D=3D mailto:dboyd@data-tactics.com =3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D
David W. Boyd
Director, Engineering
7901 Jones Branch, Suite 700
Mclean, VA 22102
office: =A0 +1-571-279-2122
fax: =A0 =A0 +1-703-506-6703
cell: =A0 =A0 +1-703-402-7908
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D http://www.data-tactics.com.com/ = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
President - USSTEM Foundation - www.usstem.org

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited. =A0If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

=A0



--
=A0 =A0Clau= dio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0
--089e0111ae344d467804e602c539--