hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjini Rathinam <ranjinibe...@gmail.com>
Subject Re: XML to TEXT
Date Fri, 03 Jan 2014 09:21:50 GMT
Hi,

I used XMLInputFormat , in that i used  Record Reader class. Same as u have
given

THe whole xml is been split into part For Eg: consider the below xml

<Comp><Emp><id></id><name></name></Emp><Emp><id></id><name></name></Emp></Comp>

after using the RecordReader class the xml output is

<Emp><id></id><name></name></Emp><Emp><id></id><name></name></Emp>

the starting and end tag is Emp.

it does not convert into text.

Please suggest and help.

Thanks in advance

Ranjini

On Fri, Jan 3, 2014 at 11:22 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:

>     Hi,
>
> you can use org.apache.hadoop.streaming.StreamInputFormat  using map
> reduce to convert XML to text.
>
> such as your xml like this:
> <xml>
>   <name>lll</name>
> </xml>
>
> you need to specify stream.recordreader.begin and stream.recordreader.end
> in the Configuration:
> Configuration conf = new Configuration();
> conf.set("stream.recordreader.begin", "<xml>");
> conf.set("stream.recordreader.end", "</xml>");
>
>
>
>
>
>
> On Fri, Jan 3, 2014 at 1:16 PM, Ranjini Rathinam <ranjinibecse@gmail.com>wrote:
>
>> Hi,
>>
>> Need to convert XML into text using mapreduce.
>>
>> I have used DOM and SAX parser.
>>
>> After using SAX Builder in mapper class. the child node act as root
>> Element.
>>
>> While seeing in Sys out i found thar root element is taking the child
>> element and printing.
>>
>> For Eg,
>>
>> <Comp><Emp><id>100</id><name>RR</name></Emp></Comp>
>> when this xml is passed in mapper , in sys out printing the root element
>>
>> I am getting the the root element as
>>
>> <id>
>> <name>
>>
>> Please suggest and help to fix this.
>>
>> I need to convert the xml into text using mapreduce code. Please provide
>> with example.
>>
>> Required output is
>>
>> id,name
>> 100,RR
>>
>> Please help.
>>
>> Thanks in advance,
>> Ranjini R
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Mime
View raw message