giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Asplund <alexaspl...@gmail.com>
Subject Re: Out of core execution has no effect on GC crash
Date Tue, 10 Sep 2013 21:54:54 GMT
Thanks, disabling GC overhead limit did the trick!

I did however run into another issue - the computation ends up
stalling when it tries to write partitions to disk. All the workers
keep sending out messages about DiskBackedPartitionStore failed to
create directory _bsp/_partitions/_jobxxxxx/part-vertices-xxx

On 9/10/13, Claudio Martella <claudio.martella@gmail.com> wrote:
> As David mentions, even with OOC, the objects are still created (and yes,
> often soon destroyed after spilled to disk) putting pressure on the GC.
> Moreover, with the increase in size of the graph, the number of in-memory
> vertices is not the only increasing chunk of memory, as there are other
> memory stores around the codebase that get filled, such as caches etc.
>
> Try increasing the heap to something reasonable for your machines.
>
>
> On Tue, Sep 10, 2013 at 3:21 AM, David Boyd
> <dboyd@data-tactics-corp.com>wrote:
>
>> Alexander:
>>     You might try turning off the GC Overhead limit
>> (-XX:-UseGCOverheadLimit)
>> Also you could turn on verbose GC logging (-verbose:gc
>> -Xloggc:/tmp/@taskid@.gc)
>> to see what is happening.
>> Because the OOC still has to create and destroy objects I suspect that
>> the
>> heap is just
>> getting really fragmented.
>>
>> There are options that you can set with Java to change the type of
>> garbage
>> collection and
>> how it is scheduled as well.
>>
>> You might up the heap size slightly - what is the default heap size on
>> your cluster?
>>
>>
>> On 9/9/2013 8:33 PM, Alexander Asplund wrote:
>>
>>> A small note: I'm not seeing any partitions directory being formed
>>> under _bsp, which is where I have understood that they should be
>>> appearing.
>>>
>>> On 9/10/13, Alexander Asplund <alexasplund@gmail.com> wrote:
>>>
>>>> Really appreciate the swift responses! Thanks again.
>>>>
>>>> I have not both increased mapper tasks and decreased max number of
>>>> partitions at the same time. I first did tests with increased Mapper
>>>> heap available, but reset the setting after it apparently caused
>>>> other, large volume, non-Giraph jobs to crash nodes when reducers also
>>>> were running.
>>>>
>>>> I'm curious why increasing mapper heap is a requirement. Shouldn't the
>>>> OOC mode be able to work with the amount of heap that is available? Is
>>>> there some agreement on the minimum amount of heap necessary for OOC
>>>> to succeed, to guide the choice of Mapper heap amount?
>>>>
>>>> Either way, I will try increasing mapper heap again as much as
>>>> possible, which hopefully will run.
>>>>
>>>> On 9/9/13, Claudio Martella <claudio.martella@gmail.com> wrote:
>>>>
>>>>> did you extend the heap available to the mapper tasks? e.g. through
>>>>> mapred.child.java.opts.
>>>>>
>>>>>
>>>>> On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund
>>>>> <alexasplund@gmail.com>wrote:
>>>>>
>>>>>  Thanks for the reply.
>>>>>>
>>>>>> I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
>>>>>> getting OOM: GC limit exceeded.
>>>>>>
>>>>>> Are there any particular cases the OOC will not be able to handle,
or
>>>>>> is it supposed to work in all cases? If the latter, it might be that
>>>>>> I
>>>>>> have made some configuration error.
>>>>>>
>>>>>> I do have one concern that might indicateI have done something wrong:
>>>>>> to allow OOC to activate without crashing I had to modify the trunk
>>>>>> code. This was because Giraph relied on guava-12 and
>>>>>> DiskBackedPartitionStore used hasInt() - a method which does not
>>>>>> exist
>>>>>> in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
>>>>>> used
>>>>>>
>>>>>> I suppose this problem might indicate I'm running submitting the
job
>>>>>> using the wrong binary. Currently I am including the giraph
>>>>>> dependencies with the jar, and running using hadoop jar.
>>>>>>
>>>>>> On 9/7/13, Claudio Martella <claudio.martella@gmail.com> wrote:
>>>>>>
>>>>>>> OOC is used also at input superstep. try to decrease the number
of
>>>>>>> partitions kept in memory.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
>>>>>>> <alexasplund@gmail.com>wrote:
>>>>>>>
>>>>>>>  Hi,
>>>>>>>>
>>>>>>>> I'm trying to process a graph that is about 3 times the size
of
>>>>>>>> available memory. On the other hand, there is plenty of disk
space.
>>>>>>>> I
>>>>>>>> have enabled the giraph.useOutOfCoreGraph property, but it
still
>>>>>>>> crashes with outOfMemoryError: GC limit exceeded when I try
running
>>>>>>>> my
>>>>>>>> job.
>>>>>>>>
>>>>>>>> I'm wondering of the spilling is supposed to work during
the input
>>>>>>>> step. If so, are there any additional steps that must be
taken to
>>>>>>>> ensure it functions?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Alexander Asplund
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>     Claudio Martella
>>>>>>>     claudio.martella@gmail.com
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Alexander Asplund
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>     Claudio Martella
>>>>>     claudio.martella@gmail.com
>>>>>
>>>>>
>>>> --
>>>> Alexander Asplund
>>>>
>>>>
>>>
>>
>> --
>> ========= mailto:dboyd@data-tactics.com ============
>> David W. Boyd
>> Director, Engineering
>> 7901 Jones Branch, Suite 700
>> Mclean, VA 22102
>> office:   +1-571-279-2122
>> fax:     +1-703-506-6703
>> cell:     +1-703-402-7908
>> ==============
>> http://www.data-tactics.com.**com/<http://www.data-tactics.com.com/>============
>> First Robotic Mentor - FRC, FTC - www.iliterobotics.org
>> President - USSTEM Foundation - www.usstem.org
>>
>> The information contained in this message may be privileged
>> and/or confidential and protected from disclosure.
>> If the reader of this message is not the intended recipient
>> or an employee or agent responsible for delivering this message
>> to the intended recipient, you are hereby notified that any
>> dissemination, distribution or copying of this communication
>> is strictly prohibited.  If you have received this communication
>> in error, please notify the sender immediately by replying to
>> this message and deleting the material from any computer.
>>
>>
>>
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>


-- 
Alexander Asplund

Mime
View raw message