hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject HDFS moves and MapReduce jobs
Date Fri, 09 Jul 2010 21:43:08 GMT
This is a question I should go and test out myself but was wondering
if anyone has a quick answer.

We have map/reduce jobs that produce lots of smaller files to a folder.
We also have a hive external table pointed at this folder.
We have a tool FileCrusher which is made to bunch up multiple small
files TEXT,and SEQUENCE into 1 large file. (which we are going to open
source to help people with lots of file problems)

It is launched something like this FileCrusher /src/folder.
This process builds one large file in a temp directory, then once done
moves the old files to a junk folder and moves the new file into the

What I am looking to figure out is, if a map reduce job is started
before the files are moved, the splits are calculated and the job is
running, what will happen if I then move the files in /src/folder and
replace with a new file.

I am hoping that since the splits are associated with blocks that the
Job will produce correct results no matter what time the files are
moved. In other works after split calculate the job should be


View raw message