hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Herou <marcus.he...@tailsweep.com>
Subject Re: hadoop jobs take long time to setup
Date Sun, 28 Jun 2009 22:14:33 GMT
Hi.

Just to be clear. It is the jobtracker that needs the patched code right ?
Or is it the tasktrackers ?

Kindly

//Marcus

On Mon, Jun 29, 2009 at 12:08 AM, Mikhail Bautin <mbautin@gmail.com> wrote:

> Marcus,
>
> We currently use 0.20.0 but this patch just inserts 8 lines of code into
> TaskRunner.java, which could certainly be done with 0.18.3.
>
> Yes, this patch just appends additional jars to the child JVM classpath.
>
> I've never really used tmpjars myself, but if it involves uploading
> multiple
> jar files into HDFS every time a job is started, I see how it can be really
> slow. On our ~80-job workflow this would have really slowed things down.
>
> Thanks,
> Mikhail
>
> On Sun, Jun 28, 2009 at 5:40 PM, Marcus Herou <marcus.herou@tailsweep.com
> >wrote:
>
> > Makes sense... I will try both rsync and NFS but I think rsync will beat
> > NFS
> > since NFS can be slow as hell sometimes but what the heck we already have
> > our maven2 repo on NFS so why not :)
> >
> > Are you saying that this patch make the client able to configure which
> > "extra" local jar files to add as classpath when firing up the
> > TaskTrackerChild ?
> >
> > To be explicit: Do you confirm that using tmpjars like I do is a costful
> > slow operation ?
> >
> > To what branch to you apply the patch (we use 0.18.3) ?
> >
> > Cheers
> >
> > //Marcus
> >
> >
> > On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin <mbautin@gmail.com>
> > wrote:
> >
> > > This is the way we deal with this problem, too. We put our jar files on
> > > NFS, and the attached patch makes possible to add those jar files to
> the
> > > tasktracker classpath through a configuration property.
> > >
> > > Thanks,
> > > Mikhail
> > >
> > > On Sun, Jun 28, 2009 at 5:21 PM, Stuart White <stuart.white1@gmail.com
> > >wrote:
> > >
> > >> Although I've never done it, I believe you could manually copy your
> jar
> > >> files out to your cluster somewhere in hadoop's classpath, and that
> > would
> > >> remove the need for you to copy them to your cluster at the start of
> > each
> > >> job.
> > >>
> > >> On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou <
> > marcus.herou@tailsweep.com
> > >> >wrote:
> > >>
> > >> > Hi.
> > >> >
> > >> > Running without a jobtracker makes the job start almost instantly.
> > >> > I think it is due to something with the classloader. I use a huge
> > amount
> > >> of
> > >> > jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which need
> to
> > be
> > >> > loaded every time I guess.
> > >> >
> > >> > By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker
> > >> child
> > >> > live forever then ?
> > >> >
> > >> > Cheers
> > >> >
> > >> > //Marcus
> > >> >
> > >> > On Sun, Jun 28, 2009 at 9:54 PM, tim robertson <
> > >> timrobertson100@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > How long does it take to start the code locally in a single
> thread?
> > >> > >
> > >> > > Can you reuse the JVM so it only starts once per node per job?
> > >> > > conf.setNumTasksToExecutePerJvm(-1)
> > >> > >
> > >> > > Cheers,
> > >> > > Tim
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou<
> > >> marcus.herou@tailsweep.com
> > >> > >
> > >> > > wrote:
> > >> > > > Hi.
> > >> > > >
> > >> > > > Wonder how one should improve the startup times of a hadoop
job.
> > >> Some
> > >> > of
> > >> > > my
> > >> > > > jobs which have a lot of dependencies in terms of many jar
files
> > >> take a
> > >> > > long
> > >> > > > time to start in hadoop up to 2 minutes some times.
> > >> > > > The data input amounts in these cases are neglible so it
seems
> > that
> > >> > > Hadoop
> > >> > > > have a really high setup cost, which I can live with but
this
> > seems
> > >> to
> > >> > > much.
> > >> > > >
> > >> > > > Let's say a job takes 10 minutes to complete then it is
bad if
> it
> > >> takes
> > >> > 2
> > >> > > > mins to set it up... 20-30 sec max would be a lot more
> reasonable.
> > >> > > >
> > >> > > > Hints ?
> > >> > > >
> > >> > > > //Marcus
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Marcus Herou CTO and co-founder Tailsweep AB
> > >> > > > +46702561312
> > >> > > > marcus.herou@tailsweep.com
> > >> > > > http://www.tailsweep.com/
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Marcus Herou CTO and co-founder Tailsweep AB
> > >> > +46702561312
> > >> > marcus.herou@tailsweep.com
> > >> > http://www.tailsweep.com/
> > >> >
> > >>
> > >
> > >
> >
> >
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message