hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: RM Suddenly gets killed
Date Mon, 18 Mar 2013 18:04:09 GMT
How big is your heap?  There is an uncaught exception handler on almost
all of the threads that if it catches an OOM it kills the process.  It
tries to log that it caught the exception, but that does not always work,
if the process is truly out of resources.  So you may not see anything
except the process exiting.

You could turn on GC statistics and look there.

--Bobby

On 3/18/13 11:34 AM, "Ravi Prakash" <ravihoo@ymail.com> wrote:

>Muntasir,
>
>We have been running the RM with millions of jobs without it crashing.
>Could you please attach the tail of the RM logs? Perhaps a megabyte of it
>or so?
>
>Thanks
>Ravi
>
>
>
>
>________________________________
> From: Sandy Ryza <sandy.ryza@cloudera.com>
>To: yarn-dev@hadoop.apache.org
>Sent: Monday, March 18, 2013 2:03 AM
>Subject: Re: RM Suddenly gets killed
> 
>There's no special signal I'm aware of other than an exception showing up
>somewhere, probably near the end of the logs.  If this is occurring
>consistently for you, filing a JIRA with steps to reproduce would be much
>appreciated.
>
>-Sandy
>
>On Sun, Mar 17, 2013 at 7:18 PM, Muntasir Raihan Rahman <
>muntasir.raihan@gmail.com> wrote:
>
>> Hi,
>>
>> I am using the capacity scheduler.
>>
>> The only KILL messages I see in the yarn logs are related to killing
>> containers. Is there any special signal I should look for in the logs
>> that would indicate RM problems?
>>
>> Thanks
>> Muntasir.
>>
>> On Sun, Mar 17, 2013 at 9:09 PM, Sandy Ryza <sandy.ryza@cloudera.com>
>> wrote:
>> > Hi Muntasir,
>> >
>> > Do you know which scheduler you're using?  Does anything show up in
>>your
>> > resourcemanager logs?
>> >
>> > -Sandy
>> >
>> > On Sun, Mar 17, 2013 at 7:03 PM, Muntasir Raihan Rahman <
>> > muntasir.raihan@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> I am using yarn 0.23 for some experiments. I am noticing that the the
>> >> RM sometimes gets killed after a bunch of applications (around 70)
>>are
>> >> submitted.
>> >>
>> >> Are there any JIRA's related to this?
>> >>
>> >> Thanks
>> >> Muntasir.
>> >>
>> >> --
>> >> Best Regards
>> >> Muntasir Raihan Rahman
>> >> Email: muntasir.raihan@gmail.com
>> >> Phone: 1-217-979-9307
>> >> Department of Computer Science,
>> >> University of Illinois Urbana Champaign,
>> >> 3111 Siebel Center,
>> >> 201 N. Goodwin Avenue,
>> >> Urbana, IL  61801
>> >>
>>
>>
>>
>> --
>> Best Regards
>> Muntasir Raihan Rahman
>> Email: muntasir.raihan@gmail.com
>> Phone: 1-217-979-9307
>> Department of Computer Science,
>> University of Illinois Urbana Champaign,
>> 3111 Siebel Center,
>> 201 N. Goodwin Avenue,
>> Urbana, IL  61801


Mime
View raw message