hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
Date Mon, 22 Mar 2010 02:31:27 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-1057:
------------------------------

    Description: In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before
calling flush(). Therefore, if there is a concurrent reader, it's possible to race here -
the reader will see the new length while those bytes are still in the buffers of BlockReceiver.
Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum
chunk of the file is made accessible to readers even though it is not stable.  (was: In BlockReceiver.receivePacket,
it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent
reader, it's possible to race here - the reader will see the new length while those bytes
are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors
or EOFs.)
        Summary: Concurrent readers hit ChecksumExceptions if following a writer to very end
of file  (was: BlockReceiver records block length in replicaInfo before flushing)

This problem is worse than originally reported. Switching the order of flush and setBytesOnDisk
doesn't solve the problem, because the last checksum in the meta file is still changing. So,
since we don't access the data synchronously with the checksum, a client trying to read the
last several bytes of a file under construction will get checksum errors.

Solving this is likely to be very tricky...

> Concurrent readers hit ChecksumExceptions if following a writer to very end of file
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush().
Therefore, if there is a concurrent reader, it's possible to race here - the reader will see
the new length while those bytes are still in the buffers of BlockReceiver. Thus the client
will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the
file is made accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message