hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution
Date Fri, 25 Apr 2008 00:31:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592222#action_12592222
] 

Chris Douglas commented on HADOOP-3294:
---------------------------------------

bq. As a matter of fact, speculative execution was enabled before launching distcp

Running distcp with speculative execution turned on is definitely not supported. It disables
it before starting the job, but if it's somehow turned on during the copy, then its behavior-
particularly with \-update or \-override- is undefined.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere,
such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being
empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster
referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange:
BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange:
DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName
file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files
sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message