hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-13924) Handle BlockMissingException when reading from observer
Date Thu, 18 Oct 2018 06:18:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654691#comment-16654691
] 

Chao Sun edited comment on HDFS-13924 at 10/18/18 6:17 AM:
-----------------------------------------------------------

Thanks [~vagarychen] for taking a look. The test failure is because we mocked the block manager
to generate empty block locations. The old test expects request to go to observer after exiting
safe mode, which is no longer true with this fix. The test passes after I reset the mock before
exiting the safe mode.

Attached patch v1.


was (Author: csun):
Thanks [~vagarychen] for taking a look. The test failure is because we mocked the block manager
to generate empty block locations. The old test expects request to go to observer after exiting
safe mode, but because of this fix, it is no longer true and should be redirected to active.
I reset the mock before exiting the safe mode so now it should be OK.

> Handle BlockMissingException when reading from observer
> -------------------------------------------------------
>
>                 Key: HDFS-13924
>                 URL: https://issues.apache.org/jira/browse/HDFS-13924
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>         Attachments: HDFS-13924-HDFS-12943.000.patch, HDFS-13924-HDFS-12943.001.patch
>
>
> Internally we found that reading from ObserverNode may result to {{BlockMissingException}}.
This may happen when the observer sees a smaller number of DNs than active (maybe due to communication
issue with those DNs), or (we guess) late block reports from some DNs to the observer. This
error happens in [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846],
when no valid DN can be found for the {{LocatedBlock}} got from the NN side.
> One potential solution (although a little hacky) is to ask the {{DFSInputStream}} to
retry active when this happens. The retry logic already present in the code - we just have
to dynamically set a flag to ask the {{ObserverReadProxyProvider}} try active in this case.
> cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message