hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From x6i4uybz labs <x6i4uyzbz.l...@gmail.com>
Subject Re: M/R, Strange behavior with multiple Gzip files
Date Thu, 06 Dec 2012 16:25:42 GMT
Thanks for your answers.

I haven't yet the whole solution but I know :
  - the job is not running on a local TT
  - the map process is very slow
  - and the progress bar is not working proprely

So, the map tasks are running in parallel (hadoop works :)) but I don't
understand why the progression of each map task stays at 0.






On Thu, Dec 6, 2012 at 3:48 PM, Harsh J <harsh@cloudera.com> wrote:

> I tend to agree with Jean-Marc's observation. If your job client logs
> a "LocalJobRunner" at any point, then that is most definitely your
> problem.
>
> Otherwise, if you feel you are facing a scheduling problem, then it
> may most likely be your scheduler configuration. For example,
> FairScheduler has a <maxMaps/> attribute over its pools that you can
> set to control maximum parallel use of slots for jobs using that pool,
> etc..
>
> On Thu, Dec 6, 2012 at 8:10 PM, x6i4uybz labs <x6i4uyzbz.labs@gmail.com>
> wrote:
> > Hello,
> >
> > The job isn't running in local mode. In fact, I think I have just a
> problem
> > with the map task progression.
> > The counters of each map task are OK during the job execution whereas the
> > progression of each map task stays at 0%.
> >
> >
> >
> > On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari
> > <jean-marc@spaggiari.org> wrote:
> >>
> >> Hi,
> >>
> >> Have you configured the mapredsite.xml to tell where the job tracker
> >> is? If not, your job is running on the local jobtracker, running the
> >> tasks one by one.
> >>
> >> JM
> >>
> >> PS: I faced the same issue few weeks ago and got the exact same
> >> behaviour. This (above) solved the issue.
> >>
> >> 2012/12/6, x6i4uybz labs <x6i4uyzbz.labs@gmail.com>:
> >> > Sorry,
> >> >
> >> > I wrote a job M/R to process several gz files (about 2000). I've a 80
> >> > map
> >> > slots cluster
> >> > JT instantiates one map per gz file (not splittable, it's OK).
> >> >
> >> > The first 80 maps spawn. But after "initializing" state,  it seems
> there
> >> > is
> >> > one map running. And when this map is finished, another one started
> (not
> >> > 80
> >> > maps in parallel) and another is affected to the empty slot.
> >> >
> >> > I've also noticed, the first maps last more than one hour and the last
> >> > maps
> >> > 50 seconds.
> >> > Each gz file is between 10mb and 100mb.
> >> >
> >> > I don't understand the behavior.
> >> > I will launch again the job to see if I've the same issue.
> >> >
> >> > thanks, gpo
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <harsh@cloudera.com> wrote:
> >> >
> >> >> Your problem isn't clear in your description - can you please
> >> >> rephrase/redefine in terms of what you are expecting vs. what you are
> >> >> observing.
> >> >>
> >> >> Also note that Gzip files are not splittable by nature of their codec
> >> >> algorithm, and hence a TextInputFormat over plain/regular Gzip files
> >> >> would end up spawning and/or processing one whole Gzip file via one
> >> >> mapper, instead of multiple mappers per file.
> >> >>
> >> >> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs
> >> >> <x6i4uyzbz.labs@gmail.com>
> >> >> wrote:
> >> >> > Hi everybody,
> >> >> >
> >> >> > I have a M/R job which does a bulk import to hbase.
> >> >> > I have to process many gzip files (2800 x ~ 100mb)
> >> >> >
> >> >> > I don't understand why my job instanciates 80 maps but runs each
> map
> >> >> > sequentialy like if there is only one big gz file.
> >> >> >
> >> >> > Is there a problem in my driver ? Or maybe I miss something.
> >> >> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where
> >> >> args[0]
> >> >> > is a directory.
> >> >> >
> >> >> > Can you help me, please ?
> >> >> >
> >> >> > Thanks, Guillaume
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >> >>
> >> >
> >
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message