incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirk True <k...@mustardgrain.com>
Subject Re: Chukwa can't find Demux class
Date Thu, 29 Apr 2010 00:41:15 GMT
Hi all,

Just for grins I copied the Java source byte-for-byte to the Chukwa 
source folder and then ran:

    ant clean main && cp build/*.jar .


And it worked, as expected.

When one adds custom demux classes to a JAR, sticks it in 
hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow 
magically merged with chukwa-core-0.4.0.jar to produce "job.jar" or do 
they remain separate?

Thanks,
Kirk

On 4/28/10 5:09 PM, Kirk True wrote:
> Hi Jerome,
>
> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>
> I did notice that the JAVA_PLATFORM environment variable in 
> chukwa-env.sh was set to 32-bit but Hadoop was defaulting to 64-bit 
> (this is a 64-bit machine), but setting that to Linux-amd64-64 didn't 
> make any difference.
>
> Thanks,
> Kirk
>
> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>> Are you using the same version of Java for your jar and Hadoop?
>> /Jerome.
>>
>> On 4/28/10 3:33 PM, "Kirk True" <kirk@mustardgrain.com> wrote:
>>
>>     Hi Eric,
>>
>>     I added these to Hadoop's mapred-site.xml:
>>
>>
>>     <property>
>>     <name>keep.failed.task.files</name>
>>     <value>true</value>
>>     </property>
>>     <property>
>>     <name>mapred.job.tracker.persist.jobstatus.active</name>
>>     <value>true</value>
>>     </property>
>>
>>
>>     This seems to have caused the task tracker directory to stick
>>     around after the job is complete. So, for example, I have this
>>     directory:
>>
>>
>>     /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001
>>
>>
>>     Under this directory I have the following files:
>>
>>
>>     jars/
>>     job.jar
>>     org/ . . .
>>     job.xml
>>
>>     My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
>>     (apparently exploded job.jar) jars/org/... directory. However, my
>>     demux JAR appears in three places in the job.xml:
>>
>>
>>     <property>
>>     <name>mapred.job.classpath.files</name>
>>     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>>     </property>
>>     <property>
>>     <name>mapred.jar</name>
>>     <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001/jars/job.jar</value>
>>     </property>
>>     <property>
>>     <name>mapred.cache.files</name>
>>     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>>     </property>
>>
>>
>>     So it looks like when Demux.addParsers calls
>>     DistributedCache.addFileToClassPath it's working as the above job
>>     conf properties include my JAR.
>>
>>     Here's my JAR contents:
>>
>>
>>     [kirk@skinner data-collection]$ unzip -l
>>     data-collection-demux/target/data-collection-demux-0.1.jar
>>     Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>       Length     Date   Time    Name
>>      --------    ----   ----    ----
>>             0  04-28-10 15:19   META-INF/
>>           123  04-28-10 15:19   META-INF/MANIFEST.MF
>>             0  04-28-10 15:19   org/
>>             0  04-28-10 15:19   org/apache/
>>             0  04-28-10 15:19   org/apache/hadoop/
>>             0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>             0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>             0  04-28-10 15:19
>>       org/apache/hadoop/chukwa/extraction/demux/
>>             0  04-28-10 15:19
>>       org/apache/hadoop/chukwa/extraction/demux/processor/
>>             0  04-28-10 15:19
>>       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>          1697  04-28-10 15:19
>>       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.class
>>             0  04-28-10 15:19   META-INF/maven/
>>             0  04-28-10 15:19
>>       META-INF/maven/com.cisco.flip.datacollection/
>>             0  04-28-10 15:19
>>       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>          1448  04-28-10 00:23
>>       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml
>>           133  04-28-10 15:19
>>       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.properties
>>      --------                   -------
>>          3401                   16 files
>>
>>
>>     Here's how I'm copying the JAR into HDFS:
>>
>>
>>     hadoop fs -mkdir /chukwa/demux
>>     hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>     /chukwa/demux
>>
>>     Any ideas of more things to try?
>>
>>     Thanks,
>>     Kirk
>>
>>
>>     On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"
>>     <eyang@yahoo-inc.com> wrote:
>>     > Kirk,
>>     >
>>     > The shell script and job related information are stored
>>     temporarily in
>>     > file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0xx
>>     > x/, while the job is running.
>>     >
>>     > You should go into the jars directory and find out if the
>>     compressed jar
>>     > contains your class file.
>>     >
>>     > Regards,
>>     > Eric
>>     >
>>     > On 4/28/10 1:57 PM, "Kirk True" <kirk@mustardgrain.com> wrote:
>>     >
>>     > > Hi Eric,
>>     > >
>>     > > I updated MapProcessorFactory.getProcessor to dump the URLs
>>     from the
>>     > > URLClassLoader from the MapProcessorFactory.class. This is
>>     what I see:
>>     > >
>>     > >
>>     > > file:/home/kirk/bin/hadoop-0.20.2/conf/
>>     > > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/
>>     > > file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>     > >
>>     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>     > >
>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>     > > attempt_201004281320_0001_m_000000_0/work/
>>     > >
>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>     > > jars/classes
>>     > >
>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>     > > jars/
>>     > >
>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>     > > attempt_201004281320_0001_m_000000_0/work/
>>     > >
>>     > >
>>     > > Is that the expected classpath? I don't see any reference to
>>     my JAR or the
>>     > > Chukwa JARs.
>>     > >
>>     > > Also, when I try to view the contents of my
>>     "job_<timestamp>_0001" directory,
>>     > > it's automatically removed, so I can't really do any forensics
>>     after the fact.
>>     > > I know this is probably a Hadoop question, is it possible to
>>     prevent that
>>     > > auto-removal from occurring?
>>     > >
>>     > > Thanks,
>>     > > Kirk
>>     > >
>>     > > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"
>>     <kirk@mustardgrain.com> wrote:
>>     > >> Hi Eric,
>>     > >>
>>     > >> On 4/28/10 10:23 AM, Eric Yang wrote:
>>     > >>> Hi Kirk,
>>     > >>>
>>     > >>> Is the ownership of the jar file setup correctly as the user
>>     that runs
>>     > >>> demux?
>>     > >>
>>     > >> When browsing via the NameNode web UI, it lists permissions of
>>     > >> "rw-r--r--" and "kirk" as the owner (which is also the user
>>     ID running
>>     > >> the Hadoop and Chukwa processes).
>>     > >>
>>     > >>>    You may find more information by looking at running
>>     mapper task or
>>     > >>> reducer task, and try to find out the task attempt shell script.
>>     > >>
>>     > >> Where is the task attempt shell script located?
>>     > >>
>>     > >>>    Make sure
>>     > >>> the files are downloaded correctly from distributed cache,
>>     and referenced in
>>     > >>> the locally generated jar file.  Hope this helps.
>>     > >>>
>>     > >>
>>     > >> Sorry for asking such basic questions, but where is the locally
>>     > >> generated JAR file found? I'm assuming under
>>     /tmp/hadoop-<user>, by
>>     > >> default? I saw one file named job_<timstamp>.jar but it
>>     appeared to be a
>>     > >> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my
>>     "XmlBasedDemux"
>>     > >> class was nowhere to be found.
>>     > >>
>>     > >> Thanks,
>>     > >> Kirk
>>     > >>
>>     > >>> Regards,
>>     > >>> Eric
>>     > >>>
>>     > >>> On 4/28/10 9:37 AM, "Kirk True"<kirk@mustardgrain.com>
 wrote:
>>     > >>>
>>     > >>>
>>     > >>>> Hi guys,
>>     > >>>>
>>     > >>>> I have a custom Demux that I need to run to process my
>>     input, but I'm
>>     > >>>> getting
>>     > >>>> ClassNotFoundException when running in Hadoop. This is
with
>>     the released
>>     > >>>> 0.4.0
>>     > >>>> build.
>>     > >>>>
>>     > >>>> I've done the following:
>>     > >>>>
>>     > >>>> 1. I put my Demux class in the correct package
>>     > >>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>     > >>>> 2. I've added the JAR containing the Demux implementation
>>     to HDFS at
>>     > >>>> /chuka/demux
>>     > >>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>     > >>>>
>>     > >>>> The map/reduce job is picking up on the fact that I have
a
>>     custom Demux and
>>     > >>>> is
>>     > >>>> trying to load it, but I get a ClassNotFoundException.
The
>>     HDFS-based URL
>>     > >>>> to
>>     > >>>> the JAR is showing up in the job configuration in Hadoop,
>>     which is another
>>     > >>>> evidence that Chukwa and Hadoop know where the JAR lives
>>     and that it's part
>>     > >>>> of
>>     > >>>> the Chukwa-initiated job.
>>     > >>>>
>>     > >>>> My Demux is very simple. I've stripped it down to a
>>     System.out.println with
>>     > >>>> dependencies on no other classes/JARs other than Chukwa,
>>     Hadoop, and the
>>     > >>>> core
>>     > >>>> JDK. I've double-checked that my JAR is being built up
>>     correctly. I'm
>>     > >>>> completely flummoxed as to what I'm doing wrong.
>>     > >>>>
>>     > >>>> Any ideas what I'm missing? What other information can
I
>>     provide?
>>     > >>>>
>>     > >>>> Thanks!
>>     > >>>> Kirk
>>     > >>>>
>>     > >>>>
>>     > >>>
>>     > >>
>>     > >
>>     > >
>>     >
>>     >
>>
>>

Mime
View raw message