hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9103) Retry reads on DN failure
Date Mon, 16 Nov 2015 18:35:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007075#comment-15007075
] 

Haohui Mai commented on HDFS-9103:
----------------------------------

{code}
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/libhdfspp/bad_datanode_tracker.h
{code}

It can be moved into the {{bindings/c}} instead of being a public header.

{code}
+/**
+ * A node exclusion rule provides a simple way of testing if the
+ * client should attempt to connect to a node based on the node's
+ * UUID.  The FileSystem and FileHandle use the BadDataNodeTracker
+ * by default.  AsyncPreadSome takes an optional NodeExclusionRule
+ * that will override the BadDataNodeTracker.
+ **/
+class NodeExclusionRule {
+ public:
+  virtual bool IsBadNode(const std::string& node_uuid) = 0;
+};
+
{code}

The definition needs to be in {{hdfs.h}}.

{code}
+      /* This cast is temporary until HDFS-9144 lands */
+      InputStreamImpl *impl =
+          static_cast<InputStreamImpl *>(input_stream_.get());
{code}

The comment is redundant. It's also acceptable as we do assume that the {{InputStreamImpl}}
class is the only one inheriting the {{InputStream}} interface.

{code}
+  /**
+   * If optional_rule_override is null then use the bad_datanode_tracker.  If
+   *non-null
+   * use the provided NodeExclusionRule to determine eligible datanodes.
+   **/

+                      std::shared_ptr<NodeExclusionRule> optional_rule_override,
{code}

It's more clear to call the parameter {{excluded_nodes}}.

{code}
+Status InputStreamImpl::SelectBlockAndNode(
+    size_t offset, std::shared_ptr<NodeExclusionRule> optional_exclude_rule,
+    const ::hadoop::hdfs::LocatedBlockProto **block,
+    const ::hadoop::hdfs::DatanodeInfoProto **selected_dn) {

+  Status TEST_SelectBlockAndNode(
+      size_t offset, std::shared_ptr<NodeExclusionRule> rule,
+      const ::hadoop::hdfs::LocatedBlockProto **block, /* out */
+      const ::hadoop::hdfs::DatanodeInfoProto **selected_dn /* out */);
{code}

Let's simply inlines the function. Protobuf objects might contain heap objects so it's important
to make the life cycle explicit.

The unit test can be done through traits. Leaving this part out and have some testings on
the bad node tracker itself is also acceptable.

{code}
+  const DatanodeInfoProto *chosen_dn = nullptr;
+  for (int i = 0; i < it->locs_size(); ++i) {
+    const auto &di = it->locs(i);
+    if (!rule->IsBadNode(di.id().datanodeuuid())) {
+      chosen_dn = &di;
+      break;
+    }
+  }
{code}

Can be simplified with {{std::find_if()}}.

> Retry reads on DN failure
> -------------------------
>
>                 Key: HDFS-9103
>                 URL: https://issues.apache.org/jira/browse/HDFS-9103
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Bob Hansen
>            Assignee: James Clampffer
>             Fix For: HDFS-8707
>
>         Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, HDFS-9103.HDFS-8707.006.patch,
HDFS-9103.HDFS-8707.007.patch, HDFS-9103.HDFS-8707.008.patch, HDFS-9103.HDFS-8707.009.patch,
HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message