hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Albert Chern" <albert.ch...@gmail.com>
Subject Re: Re: JAR packaging
Date Mon, 30 Oct 2006 20:46:04 GMT
I'm talking about the actual JAR.  Putting the dependencies on every
node doesn't seem to be a good solution since you would have to copy
everything over every time you need something new, or sync them when
there's an update.  You might even have to restart the cluster because
I think the task runners use the same classpath as the parent (the
tasktracker), so if you add something new in hadoop/lib it won't be
picked up automagically.  Don't quote me on that though.

On 10/30/06, Grant Ingersoll <grant.ingersoll@gmail.com> wrote:
> Do you actually mean a directory named lib in the Job JAR or do you
> mean by putting them in the lib directory where Hadoop runs?  From
> the looks of RunJar.java I think you mean the first option (of
> course, the second option works, too)
>
> -Grant
>
> On Oct 30, 2006, at 6:29 AM, Vetle Roeim wrote:
>
> > On Sat, 28 Oct 2006 22:13:35 +0200, Albert Chern
> > <albert.chern@gmail.com> wrote:
> >
> >> I'm not sure if the first option works.  If it does let me know.
> >> One of the
> >> developers taught me to use option 2 by creating a jar with your
> >> dependencies in lib/.  The tasktrackers will automatically include
> >> everything in lib/ on their classpaths.
> >
> > Yeah, I ended up using this method as well, after getting
> > ClassNotFoundException on some instances. Haven't tried the first
> > method in a while, though.
> >
> >
> >> On 10/28/06, Grant Ingersoll <gsingers@apache.org> wrote:
> >>>
> >>> I'm not sure I am understanding this correctly and I don't see
> >>> anything on this in the Getting Started section, so...
> >>>
> >>> It seems that when I want to run my application in distributed mode,
> >>> I should invoke the <hadoop_home>/bin/hadoop jar <jar> (or bin/
> >>> hadoop
> >>> <main-class>) and it will copy my JAR onto the DFS and then
> >>> distribute the other nodes in the cluster can access it and run it.
> >>>
> >>> Classpath wise, there seems to be two options:
> >>>
> >>> 1. Have all the appropriate dependencies available so they are read
> >>> in by the start up commands and included in the classpath.  Does
> >>> this
> >>> means they all need to be on each node at startup time?
> >>>
> >>> 2. Create a single JAR made up of the contents of all the
> >>> dependencies
> >>>
> >>> Also, the paths must be exactly the same on all the nodes, right?
> >>>
> >>> Is this correct or am I missing something?
> >>>
> >>> Thanks,
> >>> Grant
> >>>
> >
> >
> >
> > --
> > Vetle Roeim
> > Team Manager, Information Systems
> > Opera Software ASA <URL: http://www.opera.com/ >
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
>
>
>

Mime
View raw message