hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11886) Ozone : improve error handling for putkey operation
Date Fri, 26 May 2017 17:23:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026534#comment-16026534

Chen Liang commented on HDFS-11886:

Thanks [~anu] for looking at this! No decision has been made at all for this JIRA, any thoughts
are more than welcome.

To make sure we are on the same page, did you mean maybe we can have the client send a "commit"
message to KSM after the key is written to datanode, only then KSM writes that to ksm.db?

If I understand this correctly, I think one thing with this way is that for any successful
putKey, there will always be two calls to KSM guaranteed, one to allocate block, the other
to commit the key. If putKey failed, there will be no commit and only the first call. While
for the revert-failed-key approach, there is always one call to KSM for successful putKey
(which is to allocate block), but two calls to KSM for failed putKey (revert the key). If
assuming putKey is more likely to succeed then fail, this seems to me a +1 for revert-fail.

However, another thing, is how can we be sure a key is finalized after all. For the commit-success
approach, seems easy: unless that success flag is set, the key is considered not ready (similar
to under construction), but for revert-failure approach, there will be temporary window where
a key actually failed, but before it is reverted, it has already been read by someone.  So
this seems a +1 for commit-success approach.

In short, this probably comes down to do we favor less RPC calls? or do we favor reliable
getKey at any time?

> Ozone : improve error handling for putkey operation
> ---------------------------------------------------
>                 Key: HDFS-11886
>                 URL: https://issues.apache.org/jira/browse/HDFS-11886
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Chen Liang
> Ozone's putKey operations involve a couple steps:
> 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore
> 2. allocatedBlock gets returned to client, client checks to see if container needs to
be created on datanode, if yes, create the container
> 3. writes the data to container.
> it is possible that 1 succeeded, but 2 or 3 failed, in this case there will be an entry
in KSM's local metastore, but the key is actually nowhere to be found. We need to revert 1
is 2 or 3 failed in this case. 
> To resolve this, we need at least two things to be implemented first.
> 1. We need deleteKey() to be added KSM first. 
> 2. We also need container reports to be implemented first such that SCM can track whether
the container is actually added.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message