hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From x6i4uybz labs <x6i4uyzbz.l...@gmail.com>
Subject Re: M/R, Strange behavior with multiple Gzip files
Date Thu, 06 Dec 2012 14:40:51 GMT
Hello,

The job isn't running in local mode. In fact, I think I have just a problem
with the map task progression.
The counters of each map task are OK during the job execution whereas the
progression of each map task stays at 0%.



On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi,
>
> Have you configured the mapredsite.xml to tell where the job tracker
> is? If not, your job is running on the local jobtracker, running the
> tasks one by one.
>
> JM
>
> PS: I faced the same issue few weeks ago and got the exact same
> behaviour. This (above) solved the issue.
>
> 2012/12/6, x6i4uybz labs <x6i4uyzbz.labs@gmail.com>:
> > Sorry,
> >
> > I wrote a job M/R to process several gz files (about 2000). I've a 80 map
> > slots cluster
> > JT instantiates one map per gz file (not splittable, it's OK).
> >
> > The first 80 maps spawn. But after "initializing" state,  it seems there
> is
> > one map running. And when this map is finished, another one started (not
> 80
> > maps in parallel) and another is affected to the empty slot.
> >
> > I've also noticed, the first maps last more than one hour and the last
> maps
> > 50 seconds.
> > Each gz file is between 10mb and 100mb.
> >
> > I don't understand the behavior.
> > I will launch again the job to see if I've the same issue.
> >
> > thanks, gpo
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <harsh@cloudera.com> wrote:
> >
> >> Your problem isn't clear in your description - can you please
> >> rephrase/redefine in terms of what you are expecting vs. what you are
> >> observing.
> >>
> >> Also note that Gzip files are not splittable by nature of their codec
> >> algorithm, and hence a TextInputFormat over plain/regular Gzip files
> >> would end up spawning and/or processing one whole Gzip file via one
> >> mapper, instead of multiple mappers per file.
> >>
> >> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs <x6i4uyzbz.labs@gmail.com
> >
> >> wrote:
> >> > Hi everybody,
> >> >
> >> > I have a M/R job which does a bulk import to hbase.
> >> > I have to process many gzip files (2800 x ~ 100mb)
> >> >
> >> > I don't understand why my job instanciates 80 maps but runs each map
> >> > sequentialy like if there is only one big gz file.
> >> >
> >> > Is there a problem in my driver ? Or maybe I miss something.
> >> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where
> >> args[0]
> >> > is a directory.
> >> >
> >> > Can you help me, please ?
> >> >
> >> > Thanks, Guillaume
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
> >
>

Mime
View raw message