You could do it with streaming and a single reducer:
I have several files in HDFS in a single directory that I create throughout the day. At the end of the day, I want to merge them together into one file. How do you guys do this?
It seems this would do it:
hadoop fs -getmerge /hdfs/directory/allsource* > mergefile ; cat mergefile | hadoop fs -put - ; rm mergefile ; hadoop fs -rm /hdfs/directory/allsource*
But I wonder if there's a command that can avoid writing to the local filesystem then re-writing back into HDFS. I'm looking for an HDFS equivalent to this Unix script:
cat /some/dir/allsource* > /some/dir/merged ; rm /some/dir/allsource*
Data Architect, Riot Games