hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teodor Macicas <teodor.maci...@epfl.ch>
Subject HDFS efficiently concat&split files
Date Thu, 19 Aug 2010 10:58:18 GMT
Hi all,

Does anyone know how to efficiently concatenate 2 different files in 
HDFS, as well as splitting a file into 2 different ones ?
I did this by read from a file, write to another one. Of course, this is 
very slow, a lot of I/O time was spent. Being only a splitting or a 
putting togheter job I am wondering if I can do this faster.

Also, what can I do in oder to control a reducer output file size ? This 
could be a solution of the previous question. If I would be able to do 
this, further concats&splits are not neccessary.

Thank you for your help.
Best,
Teodor

Mime
View raw message