Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Subject: Re: Too large class path for map reduce jobs
From: Henning Blohm <henning.blohm@zfabrik.de>
To: mapreduce-user@hadoop.apache.org
In-Reply-To: <AANLkTin6uJKQtBKTQ3+f5kgrW2WhodKqR_z1T=uefN5S@mail.gmail.com>
References: <1285324881.20975.29.camel@expat>
	 <AANLkTinrV5mqX-mJQ8J=Nk4H5O_WvKOkkqmrJn5AOCrY@mail.gmail.com>
	 <1286359035.7134.36.camel@linux-elo4.site>
	 <AANLkTi=opfY4Jwywh-=J12Res_3m8gbMm2oMgvyBdx16@mail.gmail.com>
	 <1286366254.16009.15.camel@expat>
	 <AANLkTinPB5cexcepkwMKTZ7QPirD2BchtPqzmjeFyre=@mail.gmail.com>
	 <AANLkTimrrTCwEm2rVWeihn9CV9RWYbt=1R96AygF-CoE@mail.gmail.com>
	 <1286437388.7086.9.camel@linux-elo4.site>
	 <AANLkTikEuLxO_RT6-N33PRBbWGo=_mAbQ=qontLFwqXc@mail.gmail.com>
	 <AANLkTin6uJKQtBKTQ3+f5kgrW2WhodKqR_z1T=uefN5S@mail.gmail.com>
Content-Type: multipart/alternative; boundary="=-X8X8waDbPtbSSllu+b0I"
Date: Fri, 08 Oct 2010 09:52:53 +0200
Message-ID: <1286524373.9216.4.camel@linux-elo4.site>
Mime-Version: 1.0


--=-X8X8waDbPtbSSllu+b0I
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Ahh... that could indeed be the case. Yes, my issue was about "large"
rather than "long". 

Thanks for clarifying!

Henning

On Thu, 2010-10-07 at 13:27 -0700, Tom White wrote:

> I wonder if there is a misunderstanding here - the problem is that the
> classpath has too many classes on it (and clashes with user classes),
> rather than it being a text string which is too long.
> 
> I would suggest that the technical discussion of how to fix this goes
> onto the JIRA.
> 
> Cheers,
> Tom
> 
> On Thu, Oct 7, 2010 at 1:23 AM, Alejandro Abdelnur <tucu@cloudera.com> wrote:
> > well, if the issue is a too long classpath, the softlink thingy will give
> > some room to breath as the total CP length will be much smaller.
> >
> > A
> > On Thu, Oct 7, 2010 at 3:43 PM, Henning Blohm <henning.blohm@zfabrik.de>
> > wrote:
> >>
> >> So that's actually another issue, right? Besides splitting the classpath
> >> into those three groups, you want the TT to create soft-links on demand to
> >> simplify the computation of classpath string. Is that right?
> >>
> >> But it's the TT that actually starts the job VM. Why does it matter what
> >> the string actually looks like, as long as it has the right content?
> >>
> >> Thanks,
> >>   Henning
> >>
> >> On Thu, 2010-10-07 at 13:22 +0800, Alejandro Abdelnur wrote:
> >>
> >> [sent too soon]
> >>
> >> The first CP shown is how it is today the CP of a task. If we change it
> >> pick up all the job JARs from the current dir, then the classpath will be
> >> much shorter (second CP shown). We can easily achieve this by soft-linking
> >> the job JARs in the work dir of the task.
> >>
> >> Alejandro
> >>
> >> On Thu, Oct 7, 2010 at 1:02 PM, Alejandro Abdelnur <tucu@cloudera.com>
> >> wrote:
> >>
> >> Fragmentation of Hadoop classpaths is another issue: hadoop should
> >> differentiate the CP in 3:
> >>
> >> 1*client CP: what is needed to submit a job (only the nachos)
> >>
> >> 2*server CP (JT/NN/TT/DD): what is need to run the cluster (the whole
> >> enchilada)
> >>
> >> 3*job CP: what is needed to run a job (some of the enchilada)
> >>
> >>
> >> But i'm not trying to get into that here. What I'm suggesting is:
> >>
> >>
> >>
> >> -----
> >>
> >> # Hadoop JARs:
> >>
> >> /Users/tucu/dev-apps/hadoop/conf
> >>
> >>
> >> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/lib/tools.jar
> >>
> >> /Users/tucu/dev-apps/hadoop/bin/..
> >>
> >> /Users/tucu/dev-apps/hadoop/bin/../hadoop-core-0.20.3-CDH3-SNAPSHOT.jar
> >>
> >> /Users/tucu/dev-apps/hadoop/bin/../lib/aspectjrt-1.6.5.jar
> >>
> >> ..... (about 30 jars from hadoop lib/ )
> >>
> >> /Users/tucu/dev-apps/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
> >>
> >> # Job JARs (for a job with only 2 JARs):
> >>
> >>
> >> /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/-2707763075630339038_639898034_1993697040/localhost/user/tucu/oozie-tucu/0000003-101004184132247-oozie-tucu-W/java-node--java/java-launcher.jar
> >>
> >>
> >> /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/3613772770922728555_-588832047_1993624983/localhost/user/tucu/examples/apps/java-main/lib/oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
> >>
> >>
> >> /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/tucu/jobcache/job_201010041326_0058/attempt_201010041326_0058_m_000000_0/work
> >>
> >> -----
> >>
> >>
> >>
> >> What I'm suggesting is that the later group, the job JARs to be
> >> soft-linked (by the TT) into the working directory, then their classpath is
> >> just:
> >>
> >> -----
> >>
> >> java-launcher.jar
> >>
> >> oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
> >>
> >> .
> >>
> >> -----
> >>
> >>
> >>
> >>
> >> Alejandro
> >>
> >> On Wed, Oct 6, 2010 at 7:57 PM, Henning Blohm <henning.blohm@zfabrik.de>
> >> wrote:
> >>
> >> Hi Alejandro,
> >>
> >>    yes, it can of course be done right (sorry if my wording seemed to
> >> imply otherwise). Just saying that I think that Hadoop M/R should not go
> >> into that class loader / module separation business. It's one Job, one VM,
> >> right? So the problem is to assign just the stuff needed to let the Job do
> >> its business without becoming an obstacle.
> >>
> >>   Must admit I didn't understand your proposal 2. How would that remove
> >> (e.g.) jetty libs from the job's classpath?
> >>
> >> Thanks,
> >>   Henning
> >>
> >> Am Mittwoch, den 06.10.2010, 18:28 +0800 schrieb Alejandro Abdelnur:
> >>
> >> 1. Classloader business can be done right. Actually it could be done as
> >> spec-ed for servlet web-apps.
> >>
> >>
> >> 2. If the issue is strictly 'too large classpath', then a simpler solution
> >> would be to sof-link all JARs to the current directory and create the
> >> classpath with the JAR names only (no path). Note that the soft-linking
> >> business is already supported by the DistributedCache. So the changes would
> >> be mostly in the TT to create the JAR names only classpath before starting
> >> the child.
> >>
> >>
> >> Alejandro
> >>
> >>
> >> On Wed, Oct 6, 2010 at 5:57 PM, Henning Blohm <henning.blohm@zfabrik.de>
> >> wrote:
> >>
> >> Hi Tom,
> >>
> >>   that's exactly it. Thanks! I don't think that I can comment on the
> >> issues in Jira so I will do it here.
> >>
> >>   Tricking with class paths and deviating from the default class loading
> >> delegation has never been anything but a short term relieve. Fixing things
> >> by imposing a "better" order of stuff on the class path will not work when
> >> people do actually use child loaders (as the parent win) - like we do. Also
> >> it may easily lead to very confusing situations because the former part of
> >> the class path is not complete and gets other stuff from a latter part etc.
> >> etc.... no good.
> >>
> >>   Child loaders are good for module separation but should not be used to
> >> "hide" type visibiliy from the parent. Almost certainly leading to Class
> >> Loader Contraint Violation - once you lose control (which is usually earlier
> >> than expected).
> >>
> >>   The suggestion to reduce the Job class path to the required minimum is
> >> the most practical approach. There is some gray area there of course and it
> >> will not be feasible to reach the absolute minimal set of types there - but
> >> something reasonable, i.e. the hadoop core that suffices to run the job.
> >> Certainly jetty & co are not required for job execution (btw. I "hacked"
> >> 0.20.2 to remove anything in "server/" from the classpath before setting the
> >> job class path).
> >>
> >>   I would suggest to
> >>
> >>   a) introduce some HADOOP_JOB_CLASSPATH var that, if set, is the
> >> additional classpath, added to the "core" classpath (as described above). If
> >> not set, for compatibility, preserve today's behavior.
> >>   b) not getting into custom child loaders for jobs as part of hadoop M/R.
> >> It's non-trivial to get it right and feels to be beyond scope.
> >>
> >>   I wouldn't mind helping btw.
> >>
> >> Thanks,
> >>   Henning
> >>
> >>
> >>
> >>
> >> On Tue, 2010-10-05 at 15:59 -0700, Tom White wrote:
> >>
> >> Hi Henning,
> >>
> >> I don't know if you've seen
> >> https://issues.apache.org/jira/browse/MAPREDUCE-1938 and
> >> https://issues.apache.org/jira/browse/MAPREDUCE-1700 which have
> >> discussion about this issue.
> >>
> >> Cheers
> >> Tom
> >>
> >> On Fri, Sep 24, 2010 at 3:41 AM, Henning Blohm <henning.blohm@zfabrik.de>
> >> wrote:
> >> > Short update on the issue:
> >> >
> >> > I tried to find a way to separate class path configurations by modifying
> >> > the
> >> > scripts in HADOOP_HOME/bin but found that TaskRunner actually copies the
> >> > class path setting from the parent process when starting a local task so
> >> > that I do not see a way of having less on a job's classpath without
> >> > modifying Hadoop.
> >> >
> >> > As that will present a real issue when running our jobs on Hadoop I
> >> > would
> >> > like to propose to change TaskRunner so that it sets a class path
> >> > specifically for M/R tasks. That class path could be defined in the
> >> > scipts
> >> > (as for the other processes) using a particular environment variable
> >> > (e.g.
> >> > HADOOP_JOB_CLASSPATH). It could default to the current VM's class path,
> >> > preserving today's behavior.
> >> >
> >> > Is it ok to enter this as an issue?
> >> >
> >> > Thanks,
> >> >   Henning
> >> >
> >> >
> >> > Am Freitag, den 17.09.2010, 16:01 +0000 schrieb Allen Wittenauer:
> >> >
> >> > On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:
> >> >
> >> >> When running map reduce tasks in Hadoop I run into classpath issues.
> >> >> Contrary to previous posts, my problem is not that I am missing classes
> >> >> on
> >> >> the Task's class path (we have a perfect solution for that) but rather
> >> >> find
> >> >> too many (e.g. ECJ classes or jetty).
> >> >
> >> > The fact that you mention:
> >> >
> >> >> The libs in HADOOP_HOME/lib seem to contain everything needed to run
> >> >> anything in Hadoop which is, I assume, much more than is needed to run
> >> >> a map
> >> >> reduce task.
> >> >
> >> > hints that your perfect solution is to throw all your custom stuff in
> >> > lib.
> >> > If so, that's a huge mistake.  Use distributed cache instead.
> >> >
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >


--=-X8X8waDbPtbSSllu+b0I
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
  <META NAME="GENERATOR" CONTENT="GtkHTML/3.28.2">
</HEAD>
<BODY>
Ahh... that could indeed be the case. Yes, my issue was about &quot;large&quot; rather than &quot;long&quot;. <BR>
<BR>
Thanks for clarifying!<BR>
<BR>
Henning<BR>
<BR>
On Thu, 2010-10-07 at 13:27 -0700, Tom White wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
I wonder if there is a misunderstanding here - the problem is that the
classpath has too many classes on it (and clashes with user classes),
rather than it being a text string which is too long.

I would suggest that the technical discussion of how to fix this goes
onto the JIRA.

Cheers,
Tom

On Thu, Oct 7, 2010 at 1:23 AM, Alejandro Abdelnur &lt;<A HREF="mailto:tucu@cloudera.com">tucu@cloudera.com</A>&gt; wrote:
&gt; well, if the issue is a too long classpath, the softlink thingy will give
&gt; some room to breath as the total CP length will be much smaller.
&gt;
&gt; A
&gt; On Thu, Oct 7, 2010 at 3:43 PM, Henning Blohm &lt;<A HREF="mailto:henning.blohm@zfabrik.de">henning.blohm@zfabrik.de</A>&gt;
&gt; wrote:
&gt;&gt;
&gt;&gt; So that's actually another issue, right? Besides splitting the classpath
&gt;&gt; into those three groups, you want the TT to create soft-links on demand to
&gt;&gt; simplify the computation of classpath string. Is that right?
&gt;&gt;
&gt;&gt; But it's the TT that actually starts the job VM. Why does it matter what
&gt;&gt; the string actually looks like, as long as it has the right content?
&gt;&gt;
&gt;&gt; Thanks,
&gt;&gt; &nbsp; Henning
&gt;&gt;
&gt;&gt; On Thu, 2010-10-07 at 13:22 +0800, Alejandro Abdelnur wrote:
&gt;&gt;
&gt;&gt; [sent too soon]
&gt;&gt;
&gt;&gt; The first CP shown is how it is today the CP of a task. If we change it
&gt;&gt; pick up all the job JARs from the current dir, then the classpath will be
&gt;&gt; much shorter (second CP shown). We can easily achieve this by soft-linking
&gt;&gt; the job JARs in the work dir of the task.
&gt;&gt;
&gt;&gt; Alejandro
&gt;&gt;
&gt;&gt; On Thu, Oct 7, 2010 at 1:02 PM, Alejandro Abdelnur &lt;<A HREF="mailto:tucu@cloudera.com">tucu@cloudera.com</A>&gt;
&gt;&gt; wrote:
&gt;&gt;
&gt;&gt; Fragmentation of Hadoop classpaths is another issue: hadoop should
&gt;&gt; differentiate the CP in 3:
&gt;&gt;
&gt;&gt; 1*client CP: what is needed to submit a job (only the nachos)
&gt;&gt;
&gt;&gt; 2*server CP (JT/NN/TT/DD): what is need to run the cluster (the whole
&gt;&gt; enchilada)
&gt;&gt;
&gt;&gt; 3*job CP: what is needed to run a job (some of the enchilada)
&gt;&gt;
&gt;&gt;
&gt;&gt; But i'm not trying to get into that here. What I'm suggesting is:
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; -----
&gt;&gt;
&gt;&gt; # Hadoop JARs:
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/conf
&gt;&gt;
&gt;&gt;
&gt;&gt; /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/lib/tools.jar
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/bin/..
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/bin/../hadoop-core-0.20.3-CDH3-SNAPSHOT.jar
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/bin/../lib/aspectjrt-1.6.5.jar
&gt;&gt;
&gt;&gt; ..... (about 30 jars from hadoop lib/ )
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
&gt;&gt;
&gt;&gt; # Job JARs (for a job with only 2 JARs):
&gt;&gt;
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/-2707763075630339038_639898034_1993697040/localhost/user/tucu/oozie-tucu/0000003-101004184132247-oozie-tucu-W/java-node--java/java-launcher.jar
&gt;&gt;
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/3613772770922728555_-588832047_1993624983/localhost/user/tucu/examples/apps/java-main/lib/oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
&gt;&gt;
&gt;&gt;
&gt;&gt; /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/tucu/jobcache/job_201010041326_0058/attempt_201010041326_0058_m_000000_0/work
&gt;&gt;
&gt;&gt; -----
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; What I'm suggesting is that the later group, the job JARs to be
&gt;&gt; soft-linked (by the TT) into the working directory, then their classpath is
&gt;&gt; just:
&gt;&gt;
&gt;&gt; -----
&gt;&gt;
&gt;&gt; java-launcher.jar
&gt;&gt;
&gt;&gt; oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
&gt;&gt;
&gt;&gt; .
&gt;&gt;
&gt;&gt; -----
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; Alejandro
&gt;&gt;
&gt;&gt; On Wed, Oct 6, 2010 at 7:57 PM, Henning Blohm &lt;<A HREF="mailto:henning.blohm@zfabrik.de">henning.blohm@zfabrik.de</A>&gt;
&gt;&gt; wrote:
&gt;&gt;
&gt;&gt; Hi Alejandro,
&gt;&gt;
&gt;&gt; &nbsp;&nbsp; yes, it can of course be done right (sorry if my wording seemed to
&gt;&gt; imply otherwise). Just saying that I think that Hadoop M/R should not go
&gt;&gt; into that class loader / module separation business. It's one Job, one VM,
&gt;&gt; right? So the problem is to assign just the stuff needed to let the Job do
&gt;&gt; its business without becoming an obstacle.
&gt;&gt;
&gt;&gt; &nbsp; Must admit I didn't understand your proposal 2. How would that remove
&gt;&gt; (e.g.) jetty libs from the job's classpath?
&gt;&gt;
&gt;&gt; Thanks,
&gt;&gt; &nbsp; Henning
&gt;&gt;
&gt;&gt; Am Mittwoch, den 06.10.2010, 18:28 +0800 schrieb Alejandro Abdelnur:
&gt;&gt;
&gt;&gt; 1. Classloader business can be done right. Actually it could be done as
&gt;&gt; spec-ed for servlet web-apps.
&gt;&gt;
&gt;&gt;
&gt;&gt; 2. If the issue is strictly 'too large classpath', then a simpler solution
&gt;&gt; would be to sof-link all JARs to the current directory and create the
&gt;&gt; classpath with the JAR names only (no path). Note that the soft-linking
&gt;&gt; business is already supported by the DistributedCache. So the changes would
&gt;&gt; be mostly in the TT to create the JAR names only classpath before starting
&gt;&gt; the child.
&gt;&gt;
&gt;&gt;
&gt;&gt; Alejandro
&gt;&gt;
&gt;&gt;
&gt;&gt; On Wed, Oct 6, 2010 at 5:57 PM, Henning Blohm &lt;<A HREF="mailto:henning.blohm@zfabrik.de">henning.blohm@zfabrik.de</A>&gt;
&gt;&gt; wrote:
&gt;&gt;
&gt;&gt; Hi Tom,
&gt;&gt;
&gt;&gt; &nbsp; that's exactly it. Thanks! I don't think that I can comment on the
&gt;&gt; issues in Jira so I will do it here.
&gt;&gt;
&gt;&gt; &nbsp; Tricking with class paths and deviating from the default class loading
&gt;&gt; delegation has never been anything but a short term relieve. Fixing things
&gt;&gt; by imposing a &quot;better&quot; order of stuff on the class path will not work when
&gt;&gt; people do actually use child loaders (as the parent win) - like we do. Also
&gt;&gt; it may easily lead to very confusing situations because the former part of
&gt;&gt; the class path is not complete and gets other stuff from a latter part etc.
&gt;&gt; etc.... no good.
&gt;&gt;
&gt;&gt; &nbsp; Child loaders are good for module separation but should not be used to
&gt;&gt; &quot;hide&quot; type visibiliy from the parent. Almost certainly leading to Class
&gt;&gt; Loader Contraint Violation - once you lose control (which is usually earlier
&gt;&gt; than expected).
&gt;&gt;
&gt;&gt; &nbsp; The suggestion to reduce the Job class path to the required minimum is
&gt;&gt; the most practical approach. There is some gray area there of course and it
&gt;&gt; will not be feasible to reach the absolute minimal set of types there - but
&gt;&gt; something reasonable, i.e. the hadoop core that suffices to run the job.
&gt;&gt; Certainly jetty &amp; co are not required for job execution (btw. I &quot;hacked&quot;
&gt;&gt; 0.20.2 to remove anything in &quot;server/&quot; from the classpath before setting the
&gt;&gt; job class path).
&gt;&gt;
&gt;&gt; &nbsp; I would suggest to
&gt;&gt;
&gt;&gt; &nbsp; a) introduce some HADOOP_JOB_CLASSPATH var that, if set, is the
&gt;&gt; additional classpath, added to the &quot;core&quot; classpath (as described above). If
&gt;&gt; not set, for compatibility, preserve today's behavior.
&gt;&gt; &nbsp; b) not getting into custom child loaders for jobs as part of hadoop M/R.
&gt;&gt; It's non-trivial to get it right and feels to be beyond scope.
&gt;&gt;
&gt;&gt; &nbsp; I wouldn't mind helping btw.
&gt;&gt;
&gt;&gt; Thanks,
&gt;&gt; &nbsp; Henning
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; On Tue, 2010-10-05 at 15:59 -0700, Tom White wrote:
&gt;&gt;
&gt;&gt; Hi Henning,
&gt;&gt;
&gt;&gt; I don't know if you've seen
&gt;&gt; <A HREF="https://issues.apache.org/jira/browse/MAPREDUCE-1938">https://issues.apache.org/jira/browse/MAPREDUCE-1938</A> and
&gt;&gt; <A HREF="https://issues.apache.org/jira/browse/MAPREDUCE-1700">https://issues.apache.org/jira/browse/MAPREDUCE-1700</A> which have
&gt;&gt; discussion about this issue.
&gt;&gt;
&gt;&gt; Cheers
&gt;&gt; Tom
&gt;&gt;
&gt;&gt; On Fri, Sep 24, 2010 at 3:41 AM, Henning Blohm &lt;<A HREF="mailto:henning.blohm@zfabrik.de">henning.blohm@zfabrik.de</A>&gt;
&gt;&gt; wrote:
&gt;&gt; &gt; Short update on the issue:
&gt;&gt; &gt;
&gt;&gt; &gt; I tried to find a way to separate class path configurations by modifying
&gt;&gt; &gt; the
&gt;&gt; &gt; scripts in HADOOP_HOME/bin but found that TaskRunner actually copies the
&gt;&gt; &gt; class path setting from the parent process when starting a local task so
&gt;&gt; &gt; that I do not see a way of having less on a job's classpath without
&gt;&gt; &gt; modifying Hadoop.
&gt;&gt; &gt;
&gt;&gt; &gt; As that will present a real issue when running our jobs on Hadoop I
&gt;&gt; &gt; would
&gt;&gt; &gt; like to propose to change TaskRunner so that it sets a class path
&gt;&gt; &gt; specifically for M/R tasks. That class path could be defined in the
&gt;&gt; &gt; scipts
&gt;&gt; &gt; (as for the other processes) using a particular environment variable
&gt;&gt; &gt; (e.g.
&gt;&gt; &gt; HADOOP_JOB_CLASSPATH). It could default to the current VM's class path,
&gt;&gt; &gt; preserving today's behavior.
&gt;&gt; &gt;
&gt;&gt; &gt; Is it ok to enter this as an issue?
&gt;&gt; &gt;
&gt;&gt; &gt; Thanks,
&gt;&gt; &gt; &nbsp; Henning
&gt;&gt; &gt;
&gt;&gt; &gt;
&gt;&gt; &gt; Am Freitag, den 17.09.2010, 16:01 +0000 schrieb Allen Wittenauer:
&gt;&gt; &gt;
&gt;&gt; &gt; On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:
&gt;&gt; &gt;
&gt;&gt; &gt;&gt; When running map reduce tasks in Hadoop I run into classpath issues.
&gt;&gt; &gt;&gt; Contrary to previous posts, my problem is not that I am missing classes
&gt;&gt; &gt;&gt; on
&gt;&gt; &gt;&gt; the Task's class path (we have a perfect solution for that) but rather
&gt;&gt; &gt;&gt; find
&gt;&gt; &gt;&gt; too many (e.g. ECJ classes or jetty).
&gt;&gt; &gt;
&gt;&gt; &gt; The fact that you mention:
&gt;&gt; &gt;
&gt;&gt; &gt;&gt; The libs in HADOOP_HOME/lib seem to contain everything needed to run
&gt;&gt; &gt;&gt; anything in Hadoop which is, I assume, much more than is needed to run
&gt;&gt; &gt;&gt; a map
&gt;&gt; &gt;&gt; reduce task.
&gt;&gt; &gt;
&gt;&gt; &gt; hints that your perfect solution is to throw all your custom stuff in
&gt;&gt; &gt; lib.
&gt;&gt; &gt; If so, that's a huge mistake.  Use distributed cache instead.
&gt;&gt; &gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;
&gt;
</PRE>
</BLOCKQUOTE>
<BR>
</BODY>
</HTML>

--=-X8X8waDbPtbSSllu+b0I--