hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kumar, Senthil(AWF)" <senthiku...@ebay.com>
Subject RE: merging small files in HDFS
Date Thu, 03 Nov 2016 13:58:08 GMT
Can't we use getmerge here ?  If you requirement is to merge some files in a particular directory
to single file .. 

hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile>

--Senthil
-----Original Message-----
From: Giovanni Mascari [mailto:giovanni.mascari@polito.it] 
Sent: Thursday, November 03, 2016 7:24 PM
To: Piyush Mukati <piyush.mukati@gmail.com>; user@hadoop.apache.org
Subject: Re: merging small files in HDFS

Hi,
if I correctly understand your request you need only to merge some data resulting from an
hdfs write operation.
In this case, I suppose that your best option is to use hadoop-stream with 'cat' command.

take a look here:
https://hadoop.apache.org/docs/r1.2.1/streaming.html

Regards

Il 03/11/2016 13:53, Piyush Mukati ha scritto:
> Hi,
> I want to merge multiple files in one HDFS dir to one file. I am 
> planning to write a map only job using input format which will create 
> only one inputSplit per dir.
> this way my job don't need to do any shuffle/sort.(only read and write 
> back to disk) Is there any such file format already implemented ?
> Or any there better solution for the problem.
>
> thanks.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Mime
View raw message