hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6572) Improved incremental data copy of distcp for truncated file
Date Mon, 14 Dec 2015 19:42:46 GMT
Yongjun Zhang created MAPREDUCE-6572:

             Summary: Improved incremental data copy of distcp for truncated file
                 Key: MAPREDUCE-6572
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6572
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Yongjun Zhang
            Assignee: John Zhuge

MAPREDUCE-5899 improves distcp by supporting incremental data copy. That is, if a file is
only appended since it was copied last time, only new data need to be copied. 

This improvement was done before HDFS truncate feature (HDFS-3107) was implemented. Since
we support truncate, if a large file is truncated a little bit, the whole file will still
need to be copied, even with the solution of MAPREDUCE-5899.

Creating this jira to improve the situation, by possibly remembering the smallest truncated
size, so there is chance to only append from that size on.


This message was sent by Atlassian JIRA

View raw message