hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Time Less <timelessn...@gmail.com>
Subject Merging Files in HDFS
Date Fri, 22 Jul 2011 17:26:06 GMT
Hello, List!

I have several files in HDFS in a single directory that I create throughout
the day. At the end of the day, I want to merge them together into one file.
How do you guys do this?

It seems this would do it:
hadoop fs -getmerge /hdfs/directory/allsource* > mergefile ; cat mergefile |
hadoop fs -put - ; rm mergefile ; hadoop fs -rm /hdfs/directory/allsource*

But I wonder if there's a command that can avoid writing to the local
filesystem then re-writing back into HDFS. I'm looking for an HDFS
equivalent to this Unix script:
cat /some/dir/allsource* > /some/dir/merged ; rm /some/dir/allsource*

Tim Ellis
Data Architect, Riot Games

View raw message