hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11886) Ozone : improve error handling for putkey operation
Date Wed, 31 May 2017 05:46:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030676#comment-16030676

Weiwei Yang commented on HDFS-11886:

Thanks [~vagarychen] for raising up this problem and [~anu] for the design doc.
Let me know if I understand this correctly. The proposal adds a *ksm-keys-under-progress.db*
in KSM, only if all the steps finish successfully, a key is moved from *ksm-keys-under-progress.db*
to *ksm.db*. This introduces more times of writes to disk

  # put key to inprogress db -> add key
  # delete key in inprogress db -> commit key 1
  # add key to ksm db -> commit key 2

do we really need to persist this? Can we store the state in memory only? Only if all succeed,
commit this to *ksm.db*, otherwise dispose it. If KSM crashed before a key is committed, that
key won't be written to KSM namespace because that cache after KSM restart will be gone. This
is like a write-cache in front of ksm.db.

Another question: why we need to return a flag to OzoneHandler to determine if a container
needs to be created? I am wondering why we need these additional RPC calls, why not let SCM
creates the container on datanodes if necessary and simply return client an open container.


> Ozone : improve error handling for putkey operation
> ---------------------------------------------------
>                 Key: HDFS-11886
>                 URL: https://issues.apache.org/jira/browse/HDFS-11886
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Chen Liang
>         Attachments: design-notes-putkey.pdf
> Ozone's putKey operations involve a couple steps:
> 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore
> 2. allocatedBlock gets returned to client, client checks to see if container needs to
be created on datanode, if yes, create the container
> 3. writes the data to container.
> it is possible that 1 succeeded, but 2 or 3 failed, in this case there will be an entry
in KSM's local metastore, but the key is actually nowhere to be found. We need to revert 1
is 2 or 3 failed in this case. 
> To resolve this, we need at least two things to be implemented first.
> 1. We need deleteKey() to be added KSM first. 
> 2. We also need container reports to be implemented first such that SCM can track whether
the container is actually added.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message