hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: XML Streaming in Hadoop
Date Sun, 06 Mar 2011 19:32:13 GMT

On Mon, Mar 7, 2011 at 12:22 AM, Clement Jebakumar <jeba.ride@gmail.com> wrote:
> I want to parse XML file in hadoop.
> I have my own mapper class called "MyXMLMapper"...
> $ ./bin/hadoop jar hadoop-streaming.jar -inputreader
> "StreamXmlRecordReader,begin='<Page',end='</Page>'" -file
> /home/hdfs/XML2HBase.jar -mapper MyXMLMapper -output /temp/sample -input
> /temp/example.xml
> Caused by: java.io.IOException: Cannot run program "DmozXMLMapper":
> java.io.IOException: error=2, No such file or directory
> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> Caused by: java.io.IOException: java.io.IOException: error=2, No such file
> or directory

>From what I understand, your MyXMLMapper/DmozXMLMapper java class is
not being found by the streaming runner, so it probably considers that
as a shell program instead of a java class, and hence fails.

The issue with your command is that "-file" simply adds the given
files to the operating MR cluster but does not add it to the runtime
classpath of your mappers/reducers. Using "-libjars" instead, for your
XML2HBase.jar file to specify a dependent jar, should solve this, if
am right.

You can see all other available options of hadoop-streaming by passing
"-info". Hope this helps.

Harsh J

View raw message