hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13756) LocalMetadataStore#put(DirListingMetadata) should also put file metadata into fileHash.
Date Tue, 25 Oct 2016 01:35:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603849#comment-15603849

Aaron Fabbri commented on HADOOP-13756:

Hi [~eddyxu].. Thanks for putting together this good description.  I've been meaning to rewrite
part of LocalMetadataStore for the reason you outline here.  (Tests pass because clients fall
back to the backing store when get(PathMetadata) returns null.  Also getFileStatus() calls
and file creations cause much of the PathMetadata to be recorded.)

Two issues here

(1) LocalMetadataStore implementation
(2) Design of Interface: Is DirListingMetadata required?

#1. I need to rework the datastructures here.  Keeping two copies of each FileStatus is silly.
 "two hashtables" was a quick prototype that needs to be replaced.  Callers of MetadataStore
interface do not have to do separate put() for each child in a directory; those FileStatuses
were included in the put(DirListingMetadata).

#2 Do we need the "batched" API of put(DirListingMetadata)?  Here was the thought process
so far:

You can think of DirListingMetadata as "results of listStatus() plus an authoritative bit".

I thought about removing DirListingMetadata and just doing put()/get() on PathMetadata for
each directory entry.  Then we need a separate setAuthoritative(path, boolean) function. 
Does this open up new race conditions?

If Client A is putting the results of a listStatus() into MetadataStore, one by one, then
calling setAuthoritative(parent), while Client B is putting or deleting entries into the same
directory, maybe there is no race there.  Maybe we think of your proposed setAuthoritative(path,
boolean) function as a marker in time, after which, the MetadataStore knows the full contents
of the directory, instead of put(DirListingMeta, authoritative=true) as "this is the current
snapshot of the full directory contents".

If we are implementing directory-level cache invalidation (probably necessary for S3AFileStatus#isEmptyDirectory(),
and maybe as CLI operation), it could be a little tricky.  If Client A is doing its sequence
{set(child_meta_1), set(child_meta_2), ..., setAuthoritative(parent_path, true)} and Client
B needs to invalidate the parent directory in the middle of that stream, I'm not sure how
that would work.  The DirListingMetadata approach at least makes it possible for implementations
to handle it, even though many (dynamoDB) will likely not handle that case.

For #1, I will fix the LocalMetadataStore and add tests to catch this sort of case.

For #2, I'd prefer to keep this interface until we get the major patches merged (HADOOP-13631,
HADOOP-13651, and HADOOP-13449) and then do a followup JIRA for any interface changes.  I'm
open to suggestions though, what do you think?

> LocalMetadataStore#put(DirListingMetadata) should also put file metadata into fileHash.
> ---------------------------------------------------------------------------------------
>                 Key: HADOOP-13756
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13756
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Lei (Eddy) Xu
> {{LocalMetadataStore#put(DirListingMetadata)}} only puts the metadata into {{dirHash}},
thus all {{FileStatus}} s are missing from {{LocalMedataStore#fileHash()}}, which makes it
confuse to use.
> So in the current way, to correctly put file status into the store (and also set {{authoriative}}
flag), you need to run  {code}
> List<PathMetadata> metas = new ArrayList<PathMetadata>();
> boolean authorizative = true;
> for (S3AFileStatus status : files) {
>    PathMetadata meta = new PathMetadata(status);
>    store.put(meta);
> }
> DirListingMetadata dirMeta = new DirMeta(parent, metas, authorizative);
> store.put(dirMeta);
> {code}
> Since solely calling {{store.put(dirMeta)}} is not correct, and calling {{store.put(dirMeta);}}
after putting all sub-file {{FileStatuss}} does the repetitive jobs. Can we just use a {{put(PathMetadata)}}
and a {{get/setAuthorative()}}   in the MetadataStore interface instead?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message