hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4979) Implement retry cache on the namenode
Date Mon, 15 Jul 2013 18:46:51 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Suresh Srinivas updated HDFS-4979:
----------------------------------

    Attachment: HDFS-4979.2.patch

When an operation successfully completes, retryCache is populated with the request information
and any payload that is needed for generating response for retried request. We need to handle
the following use case:
# A client makes a request. The operation is complete on the namenode. Client does not get
the response. Retries the request.
#* In this case, since the operation is complete on the namenode, the operation is recorded
in the retry cache. That means, retry cache can be checked and retry can be handled outside
the namesystem lock.
# A client makes a request. The operation is still in progress on the namenode. Client gets
disconnected for some reason. Retries the request.
#* In this case, since the operation is still in progress on the namenode, the operation is
not recorded in the retry cache. That means, retry cache *must* be checked and retry can be
handled only inside the namesystem lock.

Given the second issue, I plan to do the retry checks inside the lock. 

Following conditions need to be handled by retry cache for various operations.
*File creation*
# Retry cache has <RPC request client ID + call ID and Inode Id that was created in the
previous attempt)
# Between two retries the following can happen and how it is handled:
#* File created in attempt 1 was modified (new permission etc. I plan to just return the current,
changed HdfsFileStatus in create.
#* File created in attempt 1 was deleted. Second retry will create a new file.
#* File created in attempt 1 was deleted and a new file has been created. The retry cache
entry is not used and a new attempt to create a file is made, which fails as expected.
#* The current patch does not handle the case where between retries a file got closed, due
to lease timeout, explicit recover lease call. In such a case getting subsequent additional
block fails.

*File Append*
# Retry cache has <RPC request client ID + call ID and block ID, Generation stamp of previously
returned LocatedBlock)
# Between two retries the following can happen and how it is handled:
#* If previous append attempt returned null, irrespective of the file being appened is deleted
between retries or not, null is returned. If the file is deleted between retries, next attempt
to get additional block fails.
#** If the file (that is the last block is complete), null is returned.
#* If file appending to in previous try is deleted, block no longer exists. Retry cache will
not be used and a new attempt to append to the file is made.
#* The current patch does not handle the case where between retries a file got closed, due
to lease timeout, explicit recover lease call. In such a case appending to block fails.

The above two cases are different because the return type is a comprehensive object. Hence
instead of storing the object in retry cache, it is generated during retry attempts. In the
following cases, return type is a simple object. Hence it results in simple handling with
just void, string or boolean returned.

Alternatively, at the expense of more memory we can track returned response for create and
append and simplify the code further. Any thoughts?

*Concat, createSymlink, renameTo (both the variants), delete, createSnapshot, deleteSnapshot
etc*
# Retry cache has <RPC request client ID + call ID>
# If retry cache entry is found, call immediately return with void, boolean or String.

*Still pending*
# updatePipeline()
# rollEditLog()
# endCheckpoint()
# commitBlockSynchronization()

                
> Implement retry cache on the namenode
> -------------------------------------
>
>                 Key: HDFS-4979
>                 URL: https://issues.apache.org/jira/browse/HDFS-4979
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: HDFS-4979.1.patch, HDFS-4979.2.patch, HDFS-4979.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message