hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Johnson <ajohn...@etsy.com>
Subject Re: Mapreduce HistoryServer Hangs Processing Large Jobs
Date Fri, 23 Jan 2015 21:40:56 GMT
Hi Chris,

I've created the MAPREDUCE jira:
https://issues.apache.org/jira/browse/MAPREDUCE-6222

Would it be useful to have the actual history file? I can truncate it some
if the full 500 Mb is too much.

On Fri, Jan 23, 2015 at 4:30 PM, Chris Nauroth <cnauroth@hortonworks.com>
wrote:

> Hi Andrew,
>
> I haven't seen this myself, but I can tell from the jstack output that
> numerous threads are blocked waiting to run synchronized methods.  The most
> interesting thread is the one currently holding the lock (stack trace
> truncated a bit):
>
> "1592215278@qtp-1568218435-195" daemon prio=10 tid=0x000000000135d000
> nid=0xb076 runnable [0x00007f2d7b4f1000]
>    java.lang.Thread.State: RUNNABLE
> at java.lang.StringCoding.encode(StringCoding.java:364)
> at java.lang.String.getBytes(String.java:939)
> at org.apache.avro.util.Utf8$2.toUtf8(Utf8.java:123)
> at org.apache.avro.util.Utf8.getBytesFor(Utf8.java:172)
> at org.apache.avro.util.Utf8.<init>(Utf8.java:39)
> at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)
> at
> org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
> at
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:363)
> at
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:355)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
> at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:219)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:219)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:89)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
> - locked <0x00007f344a0cf0d8> (a
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:153)
> - locked <0x00007f344a0cf0d8> (a
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:139)
> - locked <0x00007f344a0cf0d8> (a
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser)
> at
> org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:338)
> - locked <0x00007f344a0c2388> (a
> org.apache.hadoop.mapreduce.v2.hs.CompletedJob)
> at
> org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101)
> at
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:413)
> - locked <0x00007f2df576ffb0> (a
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
> at
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:106)
>
> This seems to indicate the thread is stuck deserializing Avro data from
> the history file.  Perhaps it's something particular to the data in your
> history files.  I do see an open Avro issue reporting an infinite loop
> condition during deserialization.
>
> https://issues.apache.org/jira/browse/AVRO-1422
>
> I don't know Avro well enough to be certain that this is the root cause
> though.
>
> Do you want to submit a MAPREDUCE jira with this information?  Even if the
> root cause is in Avro, we'd want to track upgrading our Avro dependency
> once a fix becomes available.  Thanks!
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
> On Fri, Jan 23, 2015 at 12:42 PM, Andrew Johnson <ajohnson@etsy.com>
> wrote:
>
>> Hey everyone,
>>
>> I'm encountering an issue with the Mapreduce HistoryServer processing the
>> history files for large jobs.  This has come up several times with for jobs
>> with around 60000 total tasks.  When the HistoryServer loads the .jhist
>> file from HDFS for a job of that size (which is usually around 500 Mb), the
>> HistoryServer's CPU usage spiked and the UI became unresponsive.  After
>> about 10 minutes I restarted the HistoryServer and it was behaving normally
>> again.
>>
>> The cluster is running CDH 5.3 (2.5.0-cdh5.3.0).  I've attached the
>> output of jstack from a time this was occurring.  I do have an example
>> .jhist file that caused the problem, but have not attached it due to its
>> size.
>>
>> Has anyone else seen this happen before?
>>
>> Thanks for your help!
>>
>> --
>> Andrew Johnson
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Andrew Johnson
Software Engineer, Etsy

Mime
View raw message