hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kostyrka <andr...@kostyrka.org>
Subject Re: Concatenating files on HDFS
Date Thu, 27 Aug 2009 08:50:23 GMT
Actually, for many use cases it's enough to keep a directory where all
parts of a given "logical" file are kept:

-) for input in hadoop jobs you just specify the directory.
-) if you need it in one piece externally, you can cat the whole
directory into one file.

Hence in my experience one does not need to concat files inside HDFS


Am Donnerstag, den 27.08.2009, 11:04 +0530 schrieb Ankur Goel:
> HDFS files are write once so you cannot append to them (at the moment).
> What you can do is copy your local file to HDFS dir containing the same file you want
to append to.
> Once that is done you can run a simple (Identity Mapper & Identity Reducer) mapreduce
job with input
> as this directory and number of reducers = 1. 
> ----- Original Message -----
> From: "Turner Kunkel" <thkunkel@gmail.com>
> To: core-user@hadoop.apache.org
> Sent: Wednesday, August 26, 2009 10:02:41 PM GMT +05:30 Chennai, Kolkata, Mumbai, New
> Subject: Concatenating files on HDFS
> Is there any way to concatenate/append a local file to a file on HDFS
> without copying down the HDFS file locally first?
> I tried:
> bin/hadoop dfs -cat file:///[local file] >> hdfs://[hdfs file]
> But it just tries to look for hdfs://[hdfs file] as a local file,
> since I suppose the dfs -cat command doesn't support the >> operator.
> Thanks.

View raw message