hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From madhu phatak <phatak....@gmail.com>
Subject Re: Processing xml documents using StreamXmlRecordReader
Date Tue, 19 Jun 2012 10:58:12 GMT
Hi,
 Set the following properties in driver class

  jobConf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
jobConf.set("stream.recordreader.begin",
"start-tag");
jobConf.set("stream.recordreader.end",
"end-tag");
                        jobConf.setInputFormat(StreamInputFormat,class);

 In Mapper, xml record will come as key of type Text,so your mapper will
look like

  public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>


On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello list,
>
>        Could anyone, who has written MapReduce jobs to process xml
> documents stored in there cluster using "StreamXmlRecordReader" share
> his/her experience??...or if you can provide me some pointers
> addressing that..Many thanks.
>
> Regards,
>     Mohammad Tariq
>



-- 
https://github.com/zinnia-phatak-dev/Nectar

Mime
View raw message