hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject Hadoop Streaming: How to parition output into subfolders?
Date Wed, 20 Jan 2016 23:43:15 GMT
Dear all,

To be specific, for example, given

    hadoop jar hadoop-streaming.jar \
      -input myInputDirs \
      -output myOutputDir \
      -mapper /bin/cat \
      -reducer /usr/bin/wc

Where myInputDirs has a *dated* subfolder structure of

       /input_dir/yyyy/mm/dd/part-*

I want myOutputDir has the same *dated* subfolder structure:

       /output_dir/yyyy/mm/dd/part-*

Guess there should be an option to do this. Can "-partitioner" or any "-D"
option make this?


Thanks & regards,
Rex

Mime
View raw message