hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3219) Disambiguate "visible length" in the code and docs
Date Tue, 16 Oct 2012 09:29:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476875#comment-13476875
] 

Yanbo Liang commented on HDFS-3219:
-----------------------------------

If the last block is uncompleted and is in the state of RBW(replica being written to), the
visible length of different replica of the same block may be different. DFSInputStream will
ask one the the DataNodes which store the replica to fetch the size of the last block. The
result may be different and is it make sense?
                
> Disambiguate "visible length" in the code and docs
> --------------------------------------------------
>
>                 Key: HDFS-3219
>                 URL: https://issues.apache.org/jira/browse/HDFS-3219
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Priority: Minor
>
> HDFS-2288 there are two definition of visible length, or rather we're using the same
name for two things:
> 1. The HDFS-265 design doc which defines it as property of the replica:
> {quote}
> visible length is the "number of bytes that have been acknowledged by the downstream
DataNodes". It is replica (not block) specific, meaning it can be different for different
replicas at a given time. In the document it is called BA (bytes acknowledged), compared to
BR (bytes received).
> {quote}
> 2. The definition in HDFS-814 and DFSClient#getVisibleLength which defines it as a property
of a file:
> {quote}
> The visible length is the length that *all* datanodes in the pipeline contain at least
such amount of data. Therefore, these data are visible to the readers.
> {quote}
> According to this definition the visible length of a file is the floor of all visible
lengths of all the replicas of the last block. It's a static property set on open, eg is not
updated when a writer calls hflush. Also DFSInputStream#readBlockLength returns the 1st visible
length of a replica it finds, so it seems possible (though unlikely) in a failure scenario
it could return a length that was longer than what all replicas had.
> This has caused confusion in a number of other jiras. We should update the design doc,
java doc, perhaps rename DFSClient#getVisibleLength etc to disambiguate this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message