hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Mackrory (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13760) S3Guard: add delete tracking
Date Fri, 21 Apr 2017 20:31:04 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Mackrory updated HADOOP-13760:
-----------------------------------
    Attachment: HADOOP-13760-HADOOP-13345.002.patch

[~fabbri] - Just the unit tests with Null, Local, and Dynamo implementations. I'm also getting
an ecnryption test and the one after it failing - haven't entirely looked into it yet but
they succeed in isolation so I'm assuming it's HADOOP-14305. As you pointed out offline, -Dlocal
doesn't do anything, but because Local's the default it still ran the tests as I intended.
And it definitely exercised all 3 implementations because I saw failures definitely related
to each one that I had to fix. I'm getting ready to run some actually workloads on an actual
cluster, too.

[~stevel@apache.org] - schema versioning aside, this would cause clusters running the old
code to continue including deleted items in lists. So it effectly prolongs the inconsistency
I'm trying to eliminate until the tombstone gets pruned or otherwise removed.

Attaching another incremental patch. I've implemented the todo to filter out deleted children
server-side when we're deciding if a directory is empty. I'm not sure I like this - the docs
indicate there are limits that apply on the pre-filtering data size, that very large directories
may hit. I'm not clear on whether regular queries would hit the same limits, but with large
directories this saves us some network traffic (but not read-bandwidth-against-quotas usage).
I also need to dig into the use of .withMaxResults. In my .001. patch I was applying that
limit before filtering out deletes, so it's only luck / coincidence that tests didn't fail
thinking non-empty directories were empty. So I need to add a test to catch that. Also not
sure if that limit applies before or after filtering. If it applies before, I shouldn't use
it.

Also added a test that does a circular series of renames and a few fixes that it required.
Most notably if a directory is created and then renamed fast enough that S3 doesn't return
it in lists yet, we used to throw a FileNotFoundException trying to decide if it was empty.
We now assume it IS empty.

> S3Guard: add delete tracking
> ----------------------------
>
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, HADOOP-13760-HADOOP-13345.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add delete
tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.  To make
deletes consistent, we need to add a {{isDeleted}} flag to {{PathMetadata}} and check it when
returning results from functions like {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651,
I added TODO comments in most of the places these new conditions are needed.  The work does
not look too bad.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message