hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: XML input to map function
Date Tue, 03 Nov 2009 00:00:47 GMT
Are the xml's in flat files or stored in Hbase?

1. If they are in flat files, you can use the StreamXmlRecordReader if that
works for you.

2. Or you can read the xml into a single string and process it however you
want. (This can be done if its in a flat file or stored in an hbase table).
I have xmls in hbase table and parse and process them as strings.

One mapper per file doesnt make sense. If its in HBase, have one mapper per
region. If they are flat files, depending on how many files you have, you
can create mappers. You can tune this for your particular requirement and
there is no "right" way to do it.

On Mon, Nov 2, 2009 at 3:01 PM, Vipul Sharma <sharmavipul@gmail.com> wrote:

> I am working on a mapreduce application that will take input from lots of
> small xml files rather than one big xml file. Each xml files has some
> record
> that I want to parse and input data in a hbase table. How should I go about
> parsing xml files and input in map functions. Should I have one mapper per
> xml file or is there another way of doing this? Thanks for your help and
> time.
> Regards,
> Vipul Sharma,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message