hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13760) S3Guard: add delete tracking
Date Fri, 21 Apr 2017 21:24:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979405#comment-15979405
] 

Aaron Fabbri commented on HADOOP-13760:
---------------------------------------

Ok.. looking at v2 patch now.  Just to set expectations, this is a complex little feature
and I expect it to take some time to get right.  Some initial questions here:

1. It looks like you make significant changes to rename.

- This makes me nervous (rename should make anybody nervous, it is healthy).  I'd suggest
we break it out to separate "pre-refactor" patch if we still need it.  In general the more
we can split out pre-refactoring the better.  It can be done later when the patch is closer
to complete.

- You change from iterating over S3's ObjectListing (which is subset of actual directory subtree
vertices), to iterating over actual directory tree via listFilesAndDirectories().  I think
the problem here is that you will be (A) deleting directory blobs that don't exist on S3,
and (B) creating directory blobs that *should not* exist on S3.  I'd expect this rename code
to cause destination subtrees to disappear, as those directory keys you write would be interpreted
as empty dirs by S3A later on.

*If* I'm correct on the last point, I hope there is a integration or unit test that fails
with this code.  If not we should create one.  

2. In innerMkdirs(), i think the try/catch block around your checkPathForDirectory() is no
longer needed.  You use return values, not FNF exception.

3. Switch statements should probably just be if / else in these cases.  Any savings you had
were given up having to add a fallthrough findbugs exclusion IMO.

I'm continuing to look at the patch but wanted to get you some early feedback.


> S3Guard: add delete tracking
> ----------------------------
>
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, HADOOP-13760-HADOOP-13345.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add delete
tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.  To make
deletes consistent, we need to add a {{isDeleted}} flag to {{PathMetadata}} and check it when
returning results from functions like {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651,
I added TODO comments in most of the places these new conditions are needed.  The work does
not look too bad.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message