hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Urso <antho...@cs.ucla.edu>
Subject Re: Gzip progress during map phase.
Date Mon, 26 Dec 2011 06:56:34 GMT
Gzip files (unlike uncompressed files) are not splittable, which may be
causing the behavior that you described.
On Dec 24, 2011 6:24 AM, "Niels Basjes" <Niels@basjes.nl> wrote:

> Hi,
>
> I noticed that the mapper progress indication in the hadoop cdh3
> distribution jumps from 0% to 100% for each gzipped input file. So when
> running with big gzipped input files the job appears to be stuck.
>
> I was unable to find a jira issue that describes this effect.
> Before I dive into this I have a few questions to you guys:
> 1) is this a known effect for the 0.20 version? If so what is the jira
> issue?
> 2) is this specific to gzip?
> 3) is this effect still present in the MRv2/yarn version of Hadoop?
>
> Thanks.
> --
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message