hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
Date Thu, 03 Jun 2010 19:59:32 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875276#action_12875276
] 

Hairong Kuang commented on HDFS-1057:
-------------------------------------

Thank Sam for working on the patch for trunk. Here are my comments:
# BlockSender.java:
#* the condition replica.getBytesOnDisk() < replicaVisibleLength should be gtBytesOnDisk()
< startOffset + length. This guarantees that the bytes to be read have already flushed
to the disk.
#* When the while loop exits and the bytes still have not flushed to disk yet, BlockSender
should throw an IOException.
#* It seems to me that we should remove the use of replicaVisisbleLength from BlockSender;
#* the way to calculate endOffset should be
{code}
if (startOffset + length falls into the same chunk where chunkChecksum.getDataLength() is
located {
    endOffset = chunkChecksum.getDataLength(); --- case 1
} else {
    endOffset = chunk boundary where (startOffset + length) is located
}
{code}
#* In case 1, the last chunk's checksum does not need to be read from disk.
# ReplicaInPipeline, ReplicaPipeInterface, and ReplicaBeingWritten
#* I do not think we need to make any change to ReplicaInPipeline and ReplicaPipeInterface
#* We just need to adds the attribute lastChecksum and two synchronized method to ReplicaBeingWritten.
Would it be more readable if we use the method names as getLastChecksumAndDataLen and setLastChecksumAndDataLen.

> Concurrent readers hit ChecksumExceptions if following a writer to very end of file
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0, 0.20-append
>            Reporter: Todd Lipcon
>            Assignee: sam rash
>            Priority: Blocker
>         Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt,
hdfs-1057-trunk-1.txt
>
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush().
Therefore, if there is a concurrent reader, it's possible to race here - the reader will see
the new length while those bytes are still in the buffers of BlockReceiver. Thus the client
will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the
file is made accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message