hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Jiang" <feng.a.ji...@gmail.com>
Subject Re: Is hadoop good for the my job?
Date Wed, 27 Sep 2006 04:37:06 GMT
One principle is that the input file must be a sequence of pairs, and you
must have a input formatter for the input file. otherwise you cannot use
mapreduce directly.

for your case, it seems that the input file is not consist of a sequence of
pairs, so may not be suitable for MapReduce.

On 9/27/06, howard chen <howachen@gmail.com> wrote:
>
> On 9/25/06, Feng Jiang <feng.a.jiang@gmail.com> wrote:
> > mapreduce doesn't know anything about your application logic. as long as
> you
> > can split the big xml into a lot of small xml files, then hadoop could
> help
> > you.
> >
> > 1. split this big xml file into 10000 small xml files, for example.
> > 2. each small xml file could be one pair in sequence file.
> > 3. then use mapreduce to read the sequence file and parse them, for
> example
> > you have 10 map & reduce tasks.
> > 4. finally you have 10 output files, which contain the format you want.
> >
>
>
> Hello,
>
> in my example, XML paring to CSV seems to be one-to-one mapping, e.g.
>
> <book>
>     <title>hadoop</title>
>     <author>peter</author>
>     <ISBN>121332</ISBN>
> </book>
>
> would become (CSV)
> hadoop,peter,121332
>
> use mapreduce seems not suitable?
>
> thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message