hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sam rash (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
Date Sat, 05 Jun 2010 00:04:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875801#action_12875801

sam rash commented on HDFS-1057:

Thanks for the quick review.
I understand most of the comments, but have a couple of questions:

1. replicaVisibleLength was here before I made any changes.  Why is it not valid?  I understood
it to be an upper bound on the bytes that could be read from this block.  Is it the case that
start + length <= replicaVisibleLength and we want to optimize?

(the for loop to wait for bytes on disk >= visible length was here before, I just moved
it earlier in the constructor)

2. not sure I understand endOffset.  This was again a variable that already existed.  What
I thought you were getting at was the condition to decide if we should use the in-memory checksum
or not (which is what you describe).  

3. If we don't put the sync set/get method in ReplicaInPipelineInterface, we will have to
use an if/else construct on instanceof in BlockReceiver and call one or the other.   I can
see the argument for keeping the method out of the interface since it is RBW-specific, but
on the other hand, it's effectively a no-op for other implementers of the interface and leads
to cleaner code (better natural polymorphism then if-else constructs to force it).

either way, just wanted to throw that out there as a question of style

> Concurrent readers hit ChecksumExceptions if following a writer to very end of file
> -----------------------------------------------------------------------------------
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0, 0.20-append
>            Reporter: Todd Lipcon
>            Assignee: sam rash
>            Priority: Blocker
>         Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt,
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush().
Therefore, if there is a concurrent reader, it's possible to race here - the reader will see
the new length while those bytes are still in the buffers of BlockReceiver. Thus the client
will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the
file is made accessible to readers even though it is not stable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message