hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Johnson <ajohn...@etsy.com>
Subject Re: Mapreduce HistoryServer Hangs Processing Large Jobs
Date Fri, 23 Jan 2015 22:52:07 GMT
I can only attach up to a 10 MB file to the jira, so I've added a subset of
the file.

On Fri, Jan 23, 2015 at 4:52 PM, Chris Nauroth <cnauroth@hortonworks.com>
wrote:

> Yes, if you can attach the history file (or a subset) to MAPREDUCE-6222,
> then that would help.  Thanks again.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
> On Fri, Jan 23, 2015 at 1:40 PM, Andrew Johnson <ajohnson@etsy.com> wrote:
>
>> Hi Chris,
>>
>> I've created the MAPREDUCE jira:
>> https://issues.apache.org/jira/browse/MAPREDUCE-6222
>>
>> Would it be useful to have the actual history file? I can truncate it
>> some if the full 500 Mb is too much.
>>
>> On Fri, Jan 23, 2015 at 4:30 PM, Chris Nauroth <cnauroth@hortonworks.com>
>> wrote:
>>
>>> Hi Andrew,
>>>
>>> I haven't seen this myself, but I can tell from the jstack output that
>>> numerous threads are blocked waiting to run synchronized methods.  The most
>>> interesting thread is the one currently holding the lock (stack trace
>>> truncated a bit):
>>>
>>> "1592215278@qtp-1568218435-195" daemon prio=10 tid=0x000000000135d000
>>> nid=0xb076 runnable [0x00007f2d7b4f1000]
>>>    java.lang.Thread.State: RUNNABLE
>>> at java.lang.StringCoding.encode(StringCoding.java:364)
>>> at java.lang.String.getBytes(String.java:939)
>>> at org.apache.avro.util.Utf8$2.toUtf8(Utf8.java:123)
>>> at org.apache.avro.util.Utf8.getBytesFor(Utf8.java:172)
>>> at org.apache.avro.util.Utf8.<init>(Utf8.java:39)
>>> at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)
>>> at
>>> org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:363)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:355)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:219)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:219)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>>> at
>>> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:89)
>>> at
>>> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
>>> - locked <0x00007f344a0cf0d8> (a
>>> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser)
>>> at
>>> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:153)
>>> - locked <0x00007f344a0cf0d8> (a
>>> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser)
>>> at
>>> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:139)
>>> - locked <0x00007f344a0cf0d8> (a
>>> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser)
>>> at
>>> org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:338)
>>> - locked <0x00007f344a0c2388> (a
>>> org.apache.hadoop.mapreduce.v2.hs.CompletedJob)
>>> at
>>> org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101)
>>> at
>>> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:413)
>>> - locked <0x00007f2df576ffb0> (a
>>> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
>>> at
>>> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:106)
>>>
>>> This seems to indicate the thread is stuck deserializing Avro data from
>>> the history file.  Perhaps it's something particular to the data in your
>>> history files.  I do see an open Avro issue reporting an infinite loop
>>> condition during deserialization.
>>>
>>> https://issues.apache.org/jira/browse/AVRO-1422
>>>
>>> I don't know Avro well enough to be certain that this is the root cause
>>> though.
>>>
>>> Do you want to submit a MAPREDUCE jira with this information?  Even if
>>> the root cause is in Avro, we'd want to track upgrading our Avro dependency
>>> once a fix becomes available.  Thanks!
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>> On Fri, Jan 23, 2015 at 12:42 PM, Andrew Johnson <ajohnson@etsy.com>
>>> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I'm encountering an issue with the Mapreduce HistoryServer processing
>>>> the history files for large jobs.  This has come up several times with for
>>>> jobs with around 60000 total tasks.  When the HistoryServer loads the
>>>> .jhist file from HDFS for a job of that size (which is usually around 500
>>>> Mb), the HistoryServer's CPU usage spiked and the UI became unresponsive.
>>>> After about 10 minutes I restarted the HistoryServer and it was behaving
>>>> normally again.
>>>>
>>>> The cluster is running CDH 5.3 (2.5.0-cdh5.3.0).  I've attached the
>>>> output of jstack from a time this was occurring.  I do have an example
>>>> .jhist file that caused the problem, but have not attached it due to its
>>>> size.
>>>>
>>>> Has anyone else seen this happen before?
>>>>
>>>> Thanks for your help!
>>>>
>>>> --
>>>> Andrew Johnson
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>>
>> --
>> Andrew Johnson
>> Software Engineer, Etsy
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Andrew Johnson
Software Engineer, Etsy

Mime
View raw message