hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: M/R, Strange behavior with multiple Gzip files
Date Thu, 06 Dec 2012 12:34:17 GMT
Hi,

Have you configured the mapredsite.xml to tell where the job tracker
is? If not, your job is running on the local jobtracker, running the
tasks one by one.

JM

PS: I faced the same issue few weeks ago and got the exact same
behaviour. This (above) solved the issue.

2012/12/6, x6i4uybz labs <x6i4uyzbz.labs@gmail.com>:
> Sorry,
>
> I wrote a job M/R to process several gz files (about 2000). I've a 80 map
> slots cluster
> JT instantiates one map per gz file (not splittable, it's OK).
>
> The first 80 maps spawn. But after "initializing" state,  it seems there is
> one map running. And when this map is finished, another one started (not 80
> maps in parallel) and another is affected to the empty slot.
>
> I've also noticed, the first maps last more than one hour and the last maps
> 50 seconds.
> Each gz file is between 10mb and 100mb.
>
> I don't understand the behavior.
> I will launch again the job to see if I've the same issue.
>
> thanks, gpo
>
>
>
>
>
>
>
>
> On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <harsh@cloudera.com> wrote:
>
>> Your problem isn't clear in your description - can you please
>> rephrase/redefine in terms of what you are expecting vs. what you are
>> observing.
>>
>> Also note that Gzip files are not splittable by nature of their codec
>> algorithm, and hence a TextInputFormat over plain/regular Gzip files
>> would end up spawning and/or processing one whole Gzip file via one
>> mapper, instead of multiple mappers per file.
>>
>> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs <x6i4uyzbz.labs@gmail.com>
>> wrote:
>> > Hi everybody,
>> >
>> > I have a M/R job which does a bulk import to hbase.
>> > I have to process many gzip files (2800 x ~ 100mb)
>> >
>> > I don't understand why my job instanciates 80 maps but runs each map
>> > sequentialy like if there is only one big gz file.
>> >
>> > Is there a problem in my driver ? Or maybe I miss something.
>> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where
>> args[0]
>> > is a directory.
>> >
>> > Can you help me, please ?
>> >
>> > Thanks, Guillaume
>>
>>
>>
>> --
>> Harsh J
>>
>

Mime
View raw message