hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: Gzip progress during map phase.
Date Tue, 27 Dec 2011 11:07:53 GMT
Assuming you're using TextInputFormat, it sounds like
https://issues.apache.org/jira/browse/MAPREDUCE-773
In 0.21.  Don't know about CDH.

Koji


On 12/27/11 2:00 AM, "Niels Basjes" <Niels@basjes.nl> wrote:

> I would not expect this. I would expect behaviour that is independent of
> the way the splits are created.
> 
> -- 
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 26 dec. 2011 07:57 schreef "Anthony Urso" <anthonyu@cs.ucla.edu> het
> volgende:
> 
>> Gzip files (unlike uncompressed files) are not splittable, which may be
>> causing the behavior that you described.
>> On Dec 24, 2011 6:24 AM, "Niels Basjes" <Niels@basjes.nl> wrote:
>> 
>>> Hi,
>>> 
>>> I noticed that the mapper progress indication in the hadoop cdh3
>>> distribution jumps from 0% to 100% for each gzipped input file. So when
>>> running with big gzipped input files the job appears to be stuck.
>>> 
>>> I was unable to find a jira issue that describes this effect.
>>> Before I dive into this I have a few questions to you guys:
>>> 1) is this a known effect for the 0.20 version? If so what is the jira
>>> issue?
>>> 2) is this specific to gzip?
>>> 3) is this effect still present in the MRv2/yarn version of Hadoop?
>>> 
>>> Thanks.
>>> --
>>> Met vriendelijke groet,
>>> Niels Basjes
>>> (Verstuurd vanaf mobiel )
>>> 
>> 


Mime
View raw message