hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Multiple output files, and controlling output file name...
Date Fri, 21 Sep 2007 20:47:17 GMT

That would be very nice indeed to be able to split results into different

Regarding the naming of the reduce output files, I am finding that the name
of the directory that they are in is really what is more useful than the
file names themselves.  If you use that convention, then the names of the
components becomes irrelevant, much as the choice of inodes is irrelevant in
a conventional file system.

On 9/21/07 1:20 PM, "C G" <parallelguy@yahoo.com> wrote:

> Hi All:
>   In the context of using the aggregation classes, is there anyway to send
> output to multiple files?  In my case, I am processing columnar records that
> are very wide.  I have to do a variety of different aggregations and the
> results of each type of aggregation is a set of rows suitable for loading into
> a database.  Rather than write all the records to "part-00000", etc., I'd like
> to write them to a series of files based.  I don't see an obvious way to do
> this..is it possible?
>   Also, for those of us that don't like "part-00000" and so forth as naming
> conventions, is there a way to name the output?  In my case, incorporating a
> date/time stamp like "loadA-200709221600" would be very useful.
>   Thanks for any advice,
>   C G
> ---------------------------------
> Tonight's top picks. What will you watch tonight? Preview the hottest shows on
> Yahoo! TV.    

View raw message