hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: rename operation
Date Thu, 26 Aug 2010 12:51:28 GMT
Hi JP,

I don't actually know the answer to your question, but we do a lot of things using files and
directories on HDFS and use renames to move files out of directories which are periodically
scanned by other processes. All I can say: it has never gone wrong. We are happily living
with the assumptions that the rename is atomic. Our directory scanning jobs runs every couple
of seconds and has done so without any error for months.

Short answer: I don't know, but it appears to be that way (ignorance is a blessing).


Friso



On 25 aug 2010, at 02:21, Jean-Pierre OCALAN wrote:

Hi,

I would like to know if the rename operation (i.e. renaming a directory or a single file)
can be consider as an atomic operation in HDFS.

Basically what i am trying to achieve is having one process that continiously add new file
into the HDFS and another process that will start every 15 minutes a map/reduce flow on file
that were newly added into the HDFS.

In other words a process A continuously read a local directory "A/in" where new files are
moved there continuously and put each file in a "A/tmp" directory on the HDFS. When A finish
to put one file in "A/tmp" it will move/rename that file into a "B/in" directory. At the same
time a process B will, every 15 minutes, push all the files present in "B/in" to a map/reduce
flow.

Regards,

-- JP


Mime
View raw message