hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: Multiple output files, and controlling output file name...
Date Fri, 21 Sep 2007 21:05:51 GMT
On Fri, Sep 21, 2007 at 01:53:21PM -0700, Joydeep Sen Sarma wrote:
>Why don't u create/write to hdfs files directly from reduce job (don't
>depend on the default reduce output dir/files)?  
>Like the cases where input is not homogenous, this seems (at least to
>me) to be another common pattern (output is not homogenous). I have run
>into this when loading data into hadoop (and wanting to organize
>different types of records into different dirs/files). 

>Just make sure
>(somehow), that different reduce jobs don't try to write to same file.

Quick note: as long as you create files in the 'mapred.output.dir' directory (via map/reduce
tasks) on hdfs, the framework will handle issues with speculative tasks etc.


>-----Original Message-----
>From: C G [mailto:parallelguy@yahoo.com] 
>Sent: Friday, September 21, 2007 1:20 PM
>To: hadoop-user@lucene.apache.org
>Subject: Multiple output files, and controlling output file name...
>Hi All:
>  In the context of using the aggregation classes, is there anyway to
>send output to multiple files?  In my case, I am processing columnar
>records that are very wide.  I have to do a variety of different
>aggregations and the results of each type of aggregation is a set of
>rows suitable for loading into a database.  Rather than write all the
>records to "part-00000", etc., I'd like to write them to a series of
>files based.  I don't see an obvious way to do this..is it possible?
>  Also, for those of us that don't like "part-00000" and so forth as
>naming conventions, is there a way to name the output?  In my case,
>incorporating a date/time stamp like "loadA-200709221600" would be very
>  Thanks for any advice,
>  C G
>Tonight's top picks. What will you watch tonight? Preview the hottest
>shows on Yahoo! TV.    

View raw message