hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
Date Thu, 08 Apr 2010 21:32:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855170#action_12855170
] 

Hairong Kuang commented on HDFS-1057:
-------------------------------------

> need to add to the recovery code to recompute the last chunk checksum during rbw recovery,
right? 
In the trunk, this is not an issue. When DN startup, rbw is changed to be rwr (ReplicaWaitingToBeRecover),
during it checks the last chunk and find out the number of bytes in the block that match its
crc. If crc does not match, the last chunk gets thrown away. This does not violates hflush
(sync) semantics because in this case some error occurred and other replicas may still have
a good copy.

But I am not sure about 0.20. I think it does something similar. Anyway, I think we could
ignore startup for the issue.

> is there an existing unit test for this case yet?
Todd posted some good concurrent reader tests at HDFS-1060. Please check if you could use
them.

> Concurrent readers hit ChecksumExceptions if following a writer to very end of file
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush().
Therefore, if there is a concurrent reader, it's possible to race here - the reader will see
the new length while those bytes are still in the buffers of BlockReceiver. Thus the client
will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the
file is made accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message