hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2288) Replicas awaiting recovery should return a full visible length
Date Fri, 09 Mar 2012 09:25:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225956#comment-13225956
] 

Konstantin Shvachko commented on HDFS-2288:
-------------------------------------------

> My understanding of visible length is "the length that all datanodes in the pipeline
contain at least such amount of data."

There is no trusted source to obtain such information, unless you keep it in ZooKeeper or
want to address the Byzantine Generals' Problem internally, which we don't.

Let me try to explain the notion of *visible length*. 
As per the [design doc|https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf]
visible length is the _"number of bytes that have been acknowledged by the downstream DataNodes"_.
It is replica (not block) specific, meaning it can be different for different replicas at
a given time. In the document it is called BA (bytes acknowledged), compared to BR (bytes
received).

If we have 3 replicas: r1, r2, r3 then all of them could have received the same number of
bytes:
r1.BR = r2.BR = r3.BR, 
but visible lengths are different, because r3 hasn't acknowledged the latest packet to r2
and r1. Until then
r3.BA = r3.BR
r2.BA = r2.BR - p
r1.BA = r1.BR - p
where p is the packet length.

Now when a client reads a byte it first verifies with one of the replicas, suppose it was
r3, if the byte is visible. The last-received-byte is visible in r3, and this means the client
can read it from any replica. When the client reads the last-received-byte from r1, it sends
to r1 the visible length obtained from r3. DN containing r1 realizes that the client has already
confirmed with another replica, that the byte was visible there, and lets the client read
that byte, even though it is not yet locally visible.

So our consistency guarantee is that after a client had read a byte from one replica that
client (or any other knowledgeable of the fact) can read that same byte from any other replica.
                
> Replicas awaiting recovery should return a full visible length
> --------------------------------------------------------------
>
>                 Key: HDFS-2288
>                 URL: https://issues.apache.org/jira/browse/HDFS-2288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.24.0
>
>         Attachments: hdfs-2288.txt
>
>
> Currently, if the client calls getReplicaVisibleLength for a RWR, it returns a visible
length of 0. This causes one of HBase's tests to fail, and I believe it's incorrect behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message