hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11708) positional read will fail if replicas moved to different DNs after stream is opened
Date Tue, 23 May 2017 04:22:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020641#comment-16020641

Vinayakumar B commented on HDFS-11708:

bq. Not sure I understand what you are trying to fix here. It looks like that chooseDataNode()
already calls refreshLocatedBlock() in case when all other locations failed. So your patch
adds another call to NameNode unconditionally on retry, probably not the best thing to do
from performance viewpoint.
I think you would have confused by the name {{refreshLocatedBlock()}}, which actually DOES
NOT fetch locations from namenode if already cached for specific position. It just picks specific
LocatedBlock from already fetched LocatedBlocks. Actual Namenode call to refetch happens in
{{openInfo(true)}} in {{chooseDatanode()}}.
The issue was, chosen LocatedBlock was old even though cached locations were updated internally
during retry in {{chooseDatanode()}}, retry was continuing with old reference of LocatedBlock,
which was created outside {{while}} loop. Now on retry, it chooses again LocatedBlock from
new cached locations.

bq. It would be good if you could provide a unit test, that fails without your fix.
I have already included a test in patch {{TestPread#testPreadFailureWithChangedBlockLocations()}}.

> positional read will fail if replicas moved to different DNs after stream is opened
> -----------------------------------------------------------------------------------
>                 Key: HDFS-11708
>                 URL: https://issues.apache.org/jira/browse/HDFS-11708
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.3
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>            Priority: Critical
>              Labels: release-blocker
>         Attachments: HDFS-11708-01.patch, HDFS-11708-02.patch, HDFS-11708-03.patch, HDFS-11708-04.patch
> Scenario:
> 1. File was written to DN1, DN2 with RF=2
> 2. File stream opened to read and kept. Block Locations are [DN1,DN2]
> 3. One of the replica (DN2) moved to another datanode (DN3) due to datanode dead/balancing/etc.
> 4. Latest block locations in NameNode will be DN1 and DN3 in the 'same order'
> 5. DN1 went down, but not yet detected as dead in NameNode.
> 6. Client start reading using positional read api "read(pos, buf[], offset, length)"

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message