incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kirk True" <k...@mustardgrain.com>
Subject Re: Chukwa can't find Demux class
Date Wed, 28 Apr 2010 22:33:46 GMT
Hi Eric,

I added these to Hadoop's mapred-site.xml:


<property>
     <name>keep.failed.task.files</name>
     <value>true</value>
</property>
<property>
     <name>mapred.job.tracker.persist.jobstatus.active</name>
     <value>true</value>
</property>


This seems to have caused the task tracker directory to stick
around after the job is complete. So, for example, I have this
directory:


/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042815
19_0001


Under this directory I have the following files:


jars/
job.jar
org/ . . .
job.xml

My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
(apparently exploded job.jar) jars/org/... directory. However, my
demux JAR appears in three places in the job.xml:


<property>
    <name>mapred.job.classpath.files</name>

<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0
.1.jar</value>
</property>
<property>
    <name>mapred.jar</name>

<value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201
004281519_0001/jars/job.jar</value>
</property>
<property>
    <name>mapred.cache.files</name>

<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0
.1.jar</value>
</property>


So it looks like when Demux.addParsers calls
DistributedCache.addFileToClassPath it's working as the above job
conf properties include my JAR.

Here's my JAR contents:


[kirk@skinner data-collection]$ unzip -l
data-collection-demux/target/data-collection-demux-0.1.jar
Archive:
data-collection-demux/target/data-collection-demux-0.1.jar
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  04-28-10 15:19   META-INF/
      123  04-28-10 15:19   META-INF/MANIFEST.MF
        0  04-28-10 15:19   org/
        0  04-28-10 15:19   org/apache/
        0  04-28-10 15:19   org/apache/hadoop/
        0  04-28-10 15:19   org/apache/hadoop/chukwa/
        0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
        0  04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/
        0  04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/processor/
        0  04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
     1697  04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBas
edDemux.class
        0  04-28-10 15:19   META-INF/maven/
        0  04-28-10 15:19
META-INF/maven/com.cisco.flip.datacollection/
        0  04-28-10 15:19
META-INF/maven/com.cisco.flip.datacollection/data-collection-demu
x/
     1448  04-28-10 00:23
META-INF/maven/com.cisco.flip.datacollection/data-collection-demu
x/pom.xml
      133  04-28-10 15:19
META-INF/maven/com.cisco.flip.datacollection/data-collection-demu
x/pom.properties
 --------                   -------
     3401                   16 files


Here's how I'm copying the JAR into HDFS:


hadoop fs -mkdir /chukwa/demux
hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
/chukwa/demux

Any ideas of more things to try?

Thanks,
Kirk


On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"
<eyang@yahoo-inc.com> wrote:
> Kirk,
>
> The shell script and job related information are stored
temporarily in
>
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_20100
4281320_0xx
> x/, while the job is running.
>
> You should go into the jars directory and find out if the
compressed jar
> contains your class file.
>
> Regards,
> Eric
>
> On 4/28/10 1:57 PM, "Kirk True" <kirk@mustardgrain.com> wrote:
>
> > Hi Eric,
> >
> > I updated MapProcessorFactory.getProcessor to dump the URLs
from the
> > URLClassLoader from the MapProcessorFactory.class. This is
what I see:
> >
> >
> > file:/home/kirk/bin/hadoop-0.20.2/conf/
> > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
> > file:/home/kirk/bin/hadoop-0.20.2/
> > file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.ja
r
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.j
ar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_20100
4281320_0001/
> > attempt_201004281320_0001_m_000000_0/work/
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_20100
4281320_0001/
> > jars/classes
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_20100
4281320_0001/
> > jars/
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_20100
4281320_0001/
> > attempt_201004281320_0001_m_000000_0/work/
> >
> >
> > Is that the expected classpath? I don't see any reference to
my JAR or the
> > Chukwa JARs.
> >
> > Also, when I try to view the contents of my
"job_<timestamp>_0001" directory,
> > it's automatically removed, so I can't really do any
forensics after the fact.
> > I know this is probably a Hadoop question, is it possible to
prevent that
> > auto-removal from occurring?
> >
> > Thanks,
> > Kirk
> >
> > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"
<kirk@mustardgrain.com> wrote:
> >> Hi Eric,
> >>
> >> On 4/28/10 10:23 AM, Eric Yang wrote:
> >>> Hi Kirk,
> >>>
> >>> Is the ownership of the jar file setup correctly as the
user that runs
> >>> demux?
> >>
> >> When browsing via the NameNode web UI, it lists permissions
of
> >> "rw-r--r--" and "kirk" as the owner (which is also the user
ID running
> >> the Hadoop and Chukwa processes).
> >>
> >>> You may find more information by looking at running mapper
task or
> >>> reducer task, and try to find out the task attempt shell
script.
> >>
> >> Where is the task attempt shell script located?
> >>
> >>> Make sure
> >>> the files are downloaded correctly from distributed cache,
and referenced in
> >>> the locally generated jar file. Hope this helps.
> >>>
> >>
> >> Sorry for asking such basic questions, but where is the
locally
> >> generated JAR file found? I'm assuming under
/tmp/hadoop-<user>, by
> >> default? I saw one file named job_<timstamp>.jar but it
appeared to be a
> >> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my
"XmlBasedDemux"
> >> class was nowhere to be found.
> >>
> >> Thanks,
> >> Kirk
> >>
> >>> Regards,
> >>> Eric
> >>>
> >>> On 4/28/10 9:37 AM, "Kirk True"<kirk@mustardgrain.com>
wrote:
> >>>
> >>>
> >>>> Hi guys,
> >>>>
> >>>> I have a custom Demux that I need to run to process my
input, but I'm
> >>>> getting
> >>>> ClassNotFoundException when running in Hadoop. This is
with the released
> >>>> 0.4.0
> >>>> build.
> >>>>
> >>>> I've done the following:
> >>>>
> >>>> 1. I put my Demux class in the correct package
> >>>>
(org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
> >>>> 2. I've added the JAR containing the Demux implementation
to HDFS at
> >>>> /chuka/demux
> >>>> 3. I've added an alias to it in chukwa-demux-conf.xml
> >>>>
> >>>> The map/reduce job is picking up on the fact that I have a
custom Demux and
> >>>> is
> >>>> trying to load it, but I get a ClassNotFoundException. The
HDFS-based URL
> >>>> to
> >>>> the JAR is showing up in the job configuration in Hadoop,
which is another
> >>>> evidence that Chukwa and Hadoop know where the JAR lives
and that it's part
> >>>> of
> >>>> the Chukwa-initiated job.
> >>>>
> >>>> My Demux is very simple. I've stripped it down to a
System.out.println with
> >>>> dependencies on no other classes/JARs other than Chukwa,
Hadoop, and the
> >>>> core
> >>>> JDK. I've double-checked that my JAR is being built up
correctly. I'm
> >>>> completely flummoxed as to what I'm doing wrong.
> >>>>
> >>>> Any ideas what I'm missing? What other information can I
provide?
> >>>>
> >>>> Thanks!
> >>>> Kirk
> >>>>
> >>>>
> >>>
> >>
> >
> >
>
>

Mime
View raw message