hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4872) Idempotent delete operation.
Date Wed, 05 Jun 2013 00:18:22 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675458#comment-13675458
] 

Aaron T. Myers commented on HDFS-4872:
--------------------------------------

I think we should again seriously consider implementing the duplicate request cache to address
all of these problems with one solution. The stated drawbacks of this were "complexity, performance,
and RAM overhead."

Regarding performance, I don't see how a constant-time lookup in an in-memory hash map could
be slower or more taxing on the NN than the alternative suggestions of the client making two
totally separate RPCs.

As for RAM overhead, the math that Todd did in [this comment|https://issues.apache.org/jira/browse/HDFS-4849?focusedCommentId=13670538&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13670538]
suggests that RAM usage should be basically negligible versus the rest of the NN heap. We
should also probably store the results of the calls in the duplicate request cache, so we
can return those to the client on a retried call, but from a quick review of the non-idempotent
operations in HDFS, all of those return values are constant in size and none of them should
be very large:

{noformat}
// Commonly-used non-idempotent client operations.
HdfsFileStatus create
void abandonBlock
boolean rename
void rename2
boolean delete
void updatePipeline

// Less commonly-used non-idempotent client operations.
void createSymlink
void concat
LocatedBlock append
void cancelDelegationToken
DataEncryptionKey getDataEncryptionKey

// Mostly administrative commands, unlikely most clients ever use.
void saveNamespace
boolean restoreFailedStorage
void refreshNodes
void finalizeUpgrade
void metaSave
{noformat}

As for complexity, I'm honestly not sure what complexity folks are concerned about. It's well
established how to generate GUIDs on clients, we already log all FS metadata mutations to
disk so adding in those GUIDs shouldn't be too tough, and adding an in-memory hash lookup
on the NN should be pretty straightforward. We can likely reuse the lightweight hash sets
we already use elsewhere in HDFS for the cache itself.

The stated reason for preferring the rename-to-tmp-then-delete scheme versus the get-inode-id-then-delete
scheme is to support versions of HDFS which do not yet have INode ID support. A duplicate
request cache should work in either case.
                
> Idempotent delete operation.
> ----------------------------
>
>                 Key: HDFS-4872
>                 URL: https://issues.apache.org/jira/browse/HDFS-4872
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.4-alpha
>            Reporter: Konstantin Shvachko
>
> Making delete idempotent is important to provide uninterrupted job execution in case
of HA failover.
> This is to discuss different approaches to idempotent implementation of delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message