hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13449) S3Guard: Implement DynamoDBMetadataStore.
Date Fri, 11 Nov 2016 02:07:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mingliang Liu updated HADOOP-13449:
    Attachment: HADOOP-13449-HADOOP-13345.003.patch

Will make the "DDBMetadataStore" 1:1 mapping with the table. So createTable, deleteTable and
provisionTable will all take no table names.
Now in v3 patch, a {{DynamoDBMetatadataStore}} object is associated with only one {{Table}}
after initialization. It's not able to delete/provision other tables by accident.

There's an inevitable risk the native libs aren't around/going to work with the native OS
running the build. What policy is good there? Fail? or downgrade to skip? It's probably easiest
to leave it as it is now (fail) and see what needs to change as/when failures surface.
That's a very good point. In the v3 patch, the test will fail early in {{setUpBeforeClass}}
method if the local server is not working (e.g. native libs are not loaded correctly). All
the test cases will be ignored then.

In short, I'd suggest to let S3AFileSystem ensure the contract.
I'm with this point as well. The {{MetadataStore}} assumes all ancestor directories (including
direct parent directory) have been pre-created by the caller/user. I have to change the base
test {{MetadataStoreTestBase}} to make all the tests pass. We have to change {{LocalMetadataStore}}
accordingly. Ping [~fabbri] and [~stevel@apache.org] for inputs.

One question I have in the implementation is that, for initialize() / destroy functions, can
we provide a version of such functions that do not take S3FIleSystem as parameters (i.e.,
taking Configuration instead)?
Yes, now we have such functionality, see {{DynamoDBMetadataStore#initialize(Configuration
conf)}}. I did not test this thoroughly but the basic idea is feasible. I hope this will help
other tools (e.g. command line) that operate the metadata store without initializing an S3FileSystem,
which will check the bucket is there, unnecessarily. Ping [~eddyxu] for more input about this.

I also tried bumping the AWS SDK version to higher value than {{1.11.0}}, say {{1.11.45}}
(see [HADOOP-13050]), only to find that the {{DynamoDBLocal}} was not included yet. We may
have to use different version for {{DynamoDBLocal}} and other AWS SDK modules, or use mocked
objects as replacements (which is not ideal). To easy this, I moved the logic to create a
DDBClient code to the {{MockS3ClientFactory}}, which may be helpful.

Provisioning table is supported.

# to make isEmpty() more efficient, and easy to integrate with file system
# implement isAuthoritative() related methods
# Make changes for S3AFileSystem integration (see [HADOOP-13651]).

> S3Guard: Implement DynamoDBMetadataStore.
> -----------------------------------------
>                 Key: HADOOP-13449
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13449
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13449-HADOOP-13345.000.patch, HADOOP-13449-HADOOP-13345.001.patch,
HADOOP-13449-HADOOP-13345.002.patch, HADOOP-13449-HADOOP-13345.003.patch
> Provide an implementation of the metadata store backed by DynamoDB.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message