hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: Cp command is not atomic
Date Wed, 25 May 2016 16:33:27 GMT
Hello Kun,

You are correct that "hdfs dfs -cp" is not atomic, but the details of that
are a bit different from what you described.  For the example you gave,
the sequence of events would be:

1. Open a.xml.
2. Create file b.xml._COPYING_.
3. Copy the bytes from a.xml to b.xml._COPYING_.
4. Rename b.xml._COPYING_ to b.xml.

b.xml._COPYING_ is a temporary file.  All the bytes are written to this
location first.  Only if the full copy is successful, it proceeds to step
4 to rename it to its final destination at b.xml.  The rename is atomic,
so overall, this has the effect that b.xml will never have
partially-written data.  Either the whole copy succeeds or the copy fails
and b.xml doesn't exist.

However, even though the rename is atomic, we can't claim the overall
operation is atomic.  For example, if the process dies between step 2 and
step 3, then the command leaves a lingering side effect in the form of the
b.xml._COPYING_ file.

Perhaps it's sufficient for your use case that the final rename step is

--Chris Nauroth

On 5/25/16, 8:21 AM, "Kun Ren" <ren.hdfs@gmail.com> wrote:

>Hi Genius,
>If I understand correctly, the shell command "cp" for the HDFS is not
>atomic, is that correct?
>For example:
>./bin/hdfs dfs -cp input/a.xml input/b.xml
>This command actually does 3 things, 1. read input/a.xml; 2. Create a new
>file input/b.xml; 3. Write the content of a.xml to b.xml;
>When I looked at the code, and the client side actually does the 3 steps
>and there are no lock between the 3 step, does it mean that the cp command
>is not guaranteed atomic?
>Thanks a lot for your reply.

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message