incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirk True <k...@mustardgrain.com>
Subject Re: Chukwa can't find Demux class - POSSIBLE FIX
Date Thu, 29 Apr 2010 01:50:20 GMT
Hi Eric,

If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two 
places: one is in the README and the other is in 
chukwa-collector-conf.xml for the writer.hdfs.filesystem property. I 
didn't change this file, so that should be the default. 
chukwa-common.xml's chukwa.data.dir is still just "/chukwa".

Thanks,
Kirk

On 4/28/10 6:34 PM, Eric Yang wrote:
> Hi Kirk,
>
> Check chukwa-common.xml and make sure that chukwa.data.dir does not 
> have hdfs://localhost:9000 pre-append to it.  It's best to leave 
> namenode address out of this path for portability.
>
> Regards,
> Eric
>
>
> On 4/28/10 6:19 PM, "Kirk True" <kirk@mustardgrain.com> wrote:
>
>     Hi all,
>
>     The problem seems to stem from the fact that the call to
>     DistributedCache.addFileToClassPath is passing in a Path that is
>     in URI form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar
>     whereas the DistributedCache API expects it to be a
>     filesystem-based path (i.e. /chukwa/demux/mydemux.jar). I'm not
>     sure why, but the FileStatus object returned by
>     FileSystem.listStatus is returning a URL-based path instead of a
>     filesystem-based path.
>
>     I kludged the Demux class' addParsers to strip the
>     "hdfs://localhost:9000" portion of the string and now my class is
>     found.
>
>     It's frustrating when stuff silently fails :) I even turned up the
>     logging in Hadoop and Chukwa to TRACE and nothing was reported.
>
>     So, my question is, do I have something misconfigured that causes
>     FileSystem.listStatus to return a URL-based path? Or does the code
>     need to be changed?
>
>     Thanks,
>     Kirk
>
>     On 4/28/10 5:41 PM, Kirk True wrote:
>
>         Hi all,
>
>         Just for grins I copied the Java source byte-for-byte to the
>         Chukwa source folder and then ran:
>
>
>             ant clean main && cp build/*.jar .
>
>
>         And it worked, as expected.
>
>         When one adds custom demux classes to a JAR, sticks it in
>         hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR
>         somehow magically merged with chukwa-core-0.4.0.jar to produce
>         "job.jar" or do they remain separate?
>
>         Thanks,
>         Kirk
>
>         On 4/28/10 5:09 PM, Kirk True wrote:
>
>              Hi Jerome,
>
>             Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>
>             I did notice that the JAVA_PLATFORM environment variable
>             in chukwa-env.sh was set to 32-bit but Hadoop was
>             defaulting to 64-bit (this is a 64-bit machine), but
>             setting that to Linux-amd64-64 didn't make any difference.
>
>             Thanks,
>             Kirk
>
>             On 4/28/10 4:00 PM, Jerome Boulon wrote:
>
>                 Re: Chukwa can't find Demux class Are you using the
>                 same version of Java for your jar and Hadoop?
>                 /Jerome.
>
>                 On 4/28/10 3:33 PM, "Kirk True"
>                 <kirk@mustardgrain.com> wrote:
>
>
>                     Hi Eric,
>
>                     I added these to Hadoop's mapred-site.xml:
>
>
>                     <property>
>                     <name>keep.failed.task.files</name>
>                     <value>true</value>
>                     </property>
>                     <property>
>                     <name>mapred.job.tracker.persist.jobstatus.active</name>
>                     <value>true</value>
>                     </property>
>
>
>                     This seems to have caused the task tracker
>                     directory to stick around after the job is
>                     complete. So, for example, I have this directory:
>
>
>                     /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001
>
>
>                     Under this directory I have the following files:
>
>
>                     jars/
>                     job.jar
>                     org/ . . .
>                     job.xml
>
>                     My Demux (XmlBasedDemux) doesn't appear in the
>                     job.jar or the (apparently exploded job.jar)
>                     jars/org/... directory. However, my demux JAR
>                     appears in three places in the job.xml:
>
>
>                     <property>
>                     <name>mapred.job.classpath.files</name>
>                     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>                     </property>
>                     <property>
>                     <name>mapred.jar</name>
>                     <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001/jars/job.jar</value>
>                     </property>
>                     <property>
>                     <name>mapred.cache.files</name>
>                     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>                     </property>
>
>
>                     So it looks like when Demux.addParsers calls
>                     DistributedCache.addFileToClassPath it's working
>                     as the above job conf properties include my JAR.
>
>                     Here's my JAR contents:
>
>
>                     [kirk@skinner data-collection]$ unzip -l
>                     data-collection-demux/target/data-collection-demux-0.1.jar
>
>                     Archive:
>                      data-collection-demux/target/data-collection-demux-0.1.jar
>                       Length     Date   Time    Name
>                      --------    ----   ----    ----
>                             0  04-28-10 15:19   META-INF/
>                           123  04-28-10 15:19   META-INF/MANIFEST.MF
>                             0  04-28-10 15:19   org/
>                             0  04-28-10 15:19   org/apache/
>                             0  04-28-10 15:19   org/apache/hadoop/
>                             0  04-28-10 15:19   org/apache/hadoop/chukwa/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/processor/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>                          1697  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.class
>                             0  04-28-10 15:19   META-INF/maven/
>                             0  04-28-10 15:19
>                       META-INF/maven/com.cisco.flip.datacollection/
>                             0  04-28-10 15:19
>                       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>                          1448  04-28-10 00:23
>                       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml
>                           133  04-28-10 15:19
>                       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.properties
>                      --------                   -------
>                          3401                   16 files
>
>
>                     Here's how I'm copying the JAR into HDFS:
>
>
>                     hadoop fs -mkdir /chukwa/demux
>                     hadoop fs -copyFromLocal
>                     /path/to/data-collection-demux-0.1.jar /chukwa/demux
>
>                     Any ideas of more things to try?
>
>                     Thanks,
>                     Kirk
>
>
>                     On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"
>                     <eyang@yahoo-inc.com> wrote:
>                     > Kirk,
>                     >
>                     > The shell script and job related information are
>                     stored temporarily in
>                     > file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0xx
>                     > x/, while the job is running.
>                     >
>                     > You should go into the jars directory and find
>                     out if the compressed jar
>                     > contains your class file.
>                     >
>                     > Regards,
>                     > Eric
>                     >
>                     > On 4/28/10 1:57 PM, "Kirk True"
>                     <kirk@mustardgrain.com> wrote:
>                     >
>                     > > Hi Eric,
>                     > >
>                     > > I updated MapProcessorFactory.getProcessor to
>                     dump the URLs from the
>                     > > URLClassLoader from the
>                     MapProcessorFactory.class. This is what I see:
>                     > >
>                     > >
>                     > > file:/home/kirk/bin/hadoop-0.20.2/conf/
>                     > > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>                     > > file:/home/kirk/bin/hadoop-0.20.2/
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>                     > > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>                     > > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > attempt_201004281320_0001_m_000000_0/work/
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > jars/classes
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > jars/
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > attempt_201004281320_0001_m_000000_0/work/
>                     > >
>                     > >
>                     > > Is that the expected classpath? I don't see any
>                     reference to my JAR or the
>                     > > Chukwa JARs.
>                     > >
>                     > > Also, when I try to view the contents of my
>                     "job_<timestamp>_0001" directory,
>                     > > it's automatically removed, so I can't really
>                     do any forensics after the fact.
>                     > > I know this is probably a Hadoop question, is
>                     it possible to prevent that
>                     > > auto-removal from occurring?
>                     > >
>                     > > Thanks,
>                     > > Kirk
>                     > >
>                     > > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"
>                     <kirk@mustardgrain.com> wrote:
>                     > >> Hi Eric,
>                     > >>
>                     > >> On 4/28/10 10:23 AM, Eric Yang wrote:
>                     > >>> Hi Kirk,
>                     > >>>
>                     > >>> Is the ownership of the jar file setup
>                     correctly as the user that runs
>                     > >>> demux?
>                     > >>
>                     > >> When browsing via the NameNode web UI, it
>                     lists permissions of
>                     > >> "rw-r--r--" and "kirk" as the owner (which is
>                     also the user ID running
>                     > >> the Hadoop and Chukwa processes).
>                     > >>
>                     > >>>    You may find more information by looking
>                     at running mapper task or
>                     > >>> reducer task, and try to find out the task
>                     attempt shell script.
>                     > >>
>                     > >> Where is the task attempt shell script located?
>                     > >>
>                     > >>>    Make sure
>                     > >>> the files are downloaded correctly from
>                     distributed cache, and referenced in
>                     > >>> the locally generated jar file.  Hope this helps.
>                     > >>>
>                     > >>
>                     > >> Sorry for asking such basic questions, but
>                     where is the locally
>                     > >> generated JAR file found? I'm assuming under
>                     /tmp/hadoop-<user>, by
>                     > >> default? I saw one file named
>                     job_<timstamp>.jar but it appeared to be a
>                     > >> byte-for-byte copy of chukwa-core-0.4.0.jar,
>                     i.e. my "XmlBasedDemux"
>                     > >> class was nowhere to be found.
>                     > >>
>                     > >> Thanks,
>                     > >> Kirk
>                     > >>
>                     > >>> Regards,
>                     > >>> Eric
>                     > >>>
>                     > >>> On 4/28/10 9:37 AM, "Kirk
>                     True"<kirk@mustardgrain.com>  wrote:
>                     > >>>
>                     > >>>
>                     > >>>> Hi guys,
>                     > >>>>
>                     > >>>> I have a custom Demux that I need to run to
>                     process my input, but I'm
>                     > >>>> getting
>                     > >>>> ClassNotFoundException when running in
>                     Hadoop. This is with the released
>                     > >>>> 0.4.0
>                     > >>>> build.
>                     > >>>>
>                     > >>>> I've done the following:
>                     > >>>>
>                     > >>>> 1. I put my Demux class in the correct package
>                     > >>>>
>                     (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>                     > >>>> 2. I've added the JAR containing the Demux
>                     implementation to HDFS at
>                     > >>>> /chuka/demux
>                     > >>>> 3. I've added an alias to it in
>                     chukwa-demux-conf.xml
>                     > >>>>
>                     > >>>> The map/reduce job is picking up on the fact
>                     that I have a custom Demux and
>                     > >>>> is
>                     > >>>> trying to load it, but I get a
>                     ClassNotFoundException. The HDFS-based URL
>                     > >>>> to
>                     > >>>> the JAR is showing up in the job
>                     configuration in Hadoop, which is another
>                     > >>>> evidence that Chukwa and Hadoop know where
>                     the JAR lives and that it's part
>                     > >>>> of
>                     > >>>> the Chukwa-initiated job.
>                     > >>>>
>                     > >>>> My Demux is very simple. I've stripped it
>                     down to a System.out.println with
>                     > >>>> dependencies on no other classes/JARs other
>                     than Chukwa, Hadoop, and the
>                     > >>>> core
>                     > >>>> JDK. I've double-checked that my JAR is
>                     being built up correctly. I'm
>                     > >>>> completely flummoxed as to what I'm doing wrong.
>                     > >>>>
>                     > >>>> Any ideas what I'm missing? What other
>                     information can I provide?
>                     > >>>>
>                     > >>>> Thanks!
>                     > >>>> Kirk
>                     > >>>>
>                     > >>>>
>                     > >>>
>                     > >>
>                     > >
>                     > >
>                     >
>                     >
>
>
>
>
>
>

Mime
View raw message