hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-2161) DistCp should double-check copy size when expectation is unmet
Date Wed, 27 Oct 2010 19:37:27 GMT
DistCp should double-check copy size when expectation is unmet

                 Key: MAPREDUCE-2161
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2161
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Dmitriy V. Ryaboy

DistCp checks if the file size on the destination matches the file size at the source in order
to do a basic sanity check.
When this fails, DistCp logs something along the lines of 
java.io.IOException: File size not matched: copied 3451980786 bytes (3.2g) to tmpfile (=hdfs://dest.hdfs/dir/_distcp_tmp_7uxv32/2010/10/26/20/fille)
but expected 3422552064 bytes (3.2g) from hdfs://source.hdfs/dir/file)

and attempts to retry. The expected file size is picked up during initialization. This expectation
can be incorrect for at least 2 reasons: you are copying a file which was being written to
at the time distcp was started (which is a bug in and of itself), or the file was replaced
at the source between the time the DistCp job was started and the time it actually tried to
copy the file.

It would make sense to get the *current* the size of the origin file when this condition is
encountered, and proceed if the newly reported file size matches that of the file copied.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message