hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2288) Replicas awaiting recovery should return a full visible length
Date Wed, 01 Feb 2012 01:12:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197474#comment-13197474

Todd Lipcon commented on HDFS-2288:

Sorry to have left this ticket idle for such a long time. We're circling back to finish getting
HBase tests passing on 23/trunk and ran afoul of this again. To continue the prior discussion:

bq. We also provide read consistency, i.e. if N bytes are successful read from one datanode,
then the same N bytes are available from all datanodes in the pipeline so that the client
can switch to other datanodes and continue reading.

By my understanding of the code, we do not provide this guarantee. For example, consider:
- Client writing to 3 DNs
- The network link between DN1 and DN2 in the pipeline is severed.
- DN2 is sending an "ack" for some bytes back to DN1, but gets stuck sending over the severed
network link

During this window of time before the pipeline has timed out, if a client connects, the "bytesAcked"
counter on DN3 will be higher than the "bytesAcked" counter on DN1. So, if a client connects
to DN3, and then reconnects to DN1, it will have fewer visible bytes.

So, I would counter that the above is not quite the right guarantee.

Let me look deeper into the HBase test to understand whether it's a case that could happen
in practice. Perhaps the correct result is not to return a "0" visible length but rather to
throw an exception, forcing the client to retry or bail.

> Replicas awaiting recovery should return a full visible length
> --------------------------------------------------------------
>                 Key: HDFS-2288
>                 URL: https://issues.apache.org/jira/browse/HDFS-2288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.24.0
>         Attachments: hdfs-2288.txt
> Currently, if the client calls getReplicaVisibleLength for a RWR, it returns a visible
length of 0. This causes one of HBase's tests to fail, and I believe it's incorrect behavior.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message