hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Cen <cenyo...@gmail.com>
Subject Re: Best way to write multiple files from a MR job?
Date Wed, 04 Mar 2009 03:16:33 GMT
have you try the MultipleOutputFormat and it is subclass?

2009/3/4 Stuart White <stuart.white1@gmail.com>

> I have a large amount of data, from which I'd like to extract multiple
> different types of data, writing each type of data to different sets
> of output files.  What's the best way to accomplish this?  (I should
> mention, I'm only using a mapper.  I have no need for sorting or
> reduction.)
> Of course, if I only wanted 1 output file, I can just .collect() the
> output from my mapper and let mapreduce write the output for me.  But,
> to get multiple output files, the only way I can see is to manually
> write the files myself from within my mapper.  If that's the correct
> way, then how can I get a unique filename for each mapper instance?
> Obviously hadoop has solved this problem, because it writes out its
> partition files (part-00000, etc...) with unique numbers.  Is there a
> way for my mappers to get this unique number being used so they can
> use it to ensure a unique filename?
> Thanks!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message