hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
Date Wed, 16 Jun 2010 18:54:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879454#action_12879454
] 

Hairong Kuang commented on HDFS-1057:
-------------------------------------

Sam, the patch is in good shape. Thanks for working on this. A few minor comments: 
1. ReplicaBeingWritten.java: dataLength and bytesOnDisk are the same, right? We do not need
to introduce another field dataLength. I am also hesitate to delare datalength & lastchecksum
as volatible. Accesses to them are already synchronized  and the norm case is that writing
without reading. 
2. We probably should remove setBytesOnDisk from ReplicaInPipelineInterface & ReplicaInPipeline.

> In 0.20, I made it so that client just treats this as a 0-length file. one of our internal
tools saw this rather frequently in 0.20.
Good to know this. Then could you please handle this case in the trunk the same as well? Thanks
again, Sam.

> Concurrent readers hit ChecksumExceptions if following a writer to very end of file
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: sam rash
>            Priority: Blocker
>         Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt,
hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt
>
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush().
Therefore, if there is a concurrent reader, it's possible to race here - the reader will see
the new length while those bytes are still in the buffers of BlockReceiver. Thus the client
will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the
file is made accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message