hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13936) S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
Date Mon, 08 Oct 2018 18:04:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642237#comment-16642237
] 

Steve Loughran commented on HADOOP-13936:
-----------------------------------------

Reviewing this with a goal of fixing.

Options
# as innerDelete/innerRename delete objects, send the deletes down in batches *maybe in their
own thread*
# make sure all that cleanup is done in a finally clause, and hope the actual execution never
fails (which is really the problem we are trying to address)
# Have the metastore take a delete set of files, knowing that it is part of a larger bulk
rename or delete operation, so giving it the option of being clever.

I'm thinking of option 3, from the metastore initiating some multi-object operation (delete?
rename?) and getting a context object back which  they will update as they go along, then
finally call complete() on.

{code}
bulkDelete = s3guard.initiateBulkDelete(path)
//..iterate through listings, with every batch of deletes
bulkDeletes.deleted(List<Path>)
//and then finally:
bulkDeletes.complete()
{code}
Naiive implementation: ignore the deleted() ops and do what happens today in complete(): delete
the tree.

Clever implementation on each deleted() batch, kick off the deletion of those objects (wrapped
in a duration log), in the complete() call do a final cleanup treewalk tp get rid of parent
entries.

The move operation would be similiar, only as it does updates in batches, it could also track
which parent directories had already been created across batches, so there'd be no replication
of parent dir creation.

On the topic of batches, these updates could also be done in a (single) worker thread within
S3AFileSystem, so that even throttled DDB operations wouldn't take up time which copy calls
could take

(also while doing this: log duration of copies @ debug; print out duration & total effective
bandwidth. These are things we need to know, and it'd give us a before/after benchmark of
any changes



> S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-13936
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13936
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1, 3.1.0, 3.1.1
>            Reporter: Rajesh Balamohan
>            Assignee: Steve Loughran
>            Priority: Blocker
>
> As a part of {{S3AFileSystem.delete}} operation {{innerDelete}} is invoked, which deletes
keys from S3 in batches (default is 1000). But DynamoDB is updated only at the end of this
operation. This can cause issues when deleting large number of keys. 
> E.g, it is possible to get exception after deleting 1000 keys and in such cases dynamoDB
would not be updated. This can cause DynamoDB to go out of sync. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message