hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei (Eddy) Xu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.
Date Thu, 05 Jan 2017 14:48:59 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lei (Eddy) Xu updated HADOOP-13650:
-----------------------------------
    Attachment: HADOOP-13650-HADOOP-13345.003.patch

Thanks a lot for the feedbacks and suggestions, [~stevel@apache.org], [~aw] and [~cnauroth].

Upload a new patch to address the comments, re-write shell script following the example of
{{distcp}}, and added tests for {{init/destroy}} metadata store.

bq.  Ideally return a different exit code for an exception
Done

bq. we have the option of JCommander here for arg parsing.

Hi, [~stevel@apache.org], I did not see {{JCommander}} as used in hadoop. So I followed the
code used in NameNode disk balancer to use {{CommandFormat}}.  Would that be ok?

bq. might be good to have the option of printing the diff out in a way that's easy to parse
downstream. 

Currently the {{diff}} out is tab separated, similar to {{oiv}} tool delimited outputs. I
can add {{XML/JSON}} output as a follow-on JIRA.

bq. Maybe an operation to verify that the metastore is in sync with s3,

Would a {{-q/--quite}} option to {{diff}}, with a non-zero return value be sufficient? Should
it immediately return when the first difference be found?

bq. For the comparison, a listFiles(recursive=true) is much faster to list s3 buckets...

This is a good suggestion. It might be difficult to do so in a near future. First, currently,
both {{LocalMetadataStore}} and {{DynamoDBMetadataStore}} use hash distribution for directories,
which can not guarantee the order of returned results. Second, IIUC using {{listFiles()}}
recursively and using {{O(1)}} space means that we have two iterators on S3 and MS respectively.
Given that both sides are possible to miss a sub-namespace, when to move one iterator instead
of another, when the files that the iterators pointed to are different, is more difficult
to implement correctly.  Should we do the optimization after merging to trunk?


Hi,  [~liuml07]  As mentioned in the [parent thread|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=15801188&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15801188],
 I think that from the CLI arguments aspect, {{init|destroy}} should be able to just provide
either metadadta store URI or s3a URI to use the command. I proposal the CLI parameters as
the following: 

{code}
hadoop s3a init [-r UNIT] [-w UNIT]  <-g REGION -m dynamodb://table | s3a://bucket>
hadoop s3a destroy <-g REGION -m dynamodb://table | s3a://bucket>
{code}
 What do you think? If doing so, we need non-trivial changes in S3A and DDB MS, and we should
file another JIRA for the change.

Thanks. 

> S3Guard: Provide command line tools to manipulate metadata store.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13650
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-13650-HADOOP-13345.000.patch, HADOOP-13650-HADOOP-13345.001.patch,
HADOOP-13650-HADOOP-13345.002.patch, HADOOP-13650-HADOOP-13345.003.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata store, i.e.,
create or delete metadata store, or {{import}}, {{sync}} the file metadata between metadata
store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message