hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: Merging Files in HDFS
Date Fri, 22 Jul 2011 17:56:44 GMT
You could do it with streaming and a single reducer:

bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar
-Dmapred.num.reduce.tasks=1 -reducer cat -input
/hdfs/directory/allsource* -output
mergefile -verbose

-Joey

On Fri, Jul 22, 2011 at 1:26 PM, Time Less <timelessness@gmail.com> wrote:

> Hello, List!
>
> I have several files in HDFS in a single directory that I create throughout
> the day. At the end of the day, I want to merge them together into one file.
> How do you guys do this?
>
> It seems this would do it:
> hadoop fs -getmerge /hdfs/directory/allsource* > mergefile ; cat mergefile
> | hadoop fs -put - ; rm mergefile ; hadoop fs -rm /hdfs/directory/allsource*
>
> But I wonder if there's a command that can avoid writing to the local
> filesystem then re-writing back into HDFS. I'm looking for an HDFS
> equivalent to this Unix script:
> cat /some/dir/allsource* > /some/dir/merged ; rm /some/dir/allsource*
>
> --
> Tim Ellis
> Data Architect, Riot Games
>
>


-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message