hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.
Date Thu, 12 Jan 2017 02:45:17 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819969#comment-15819969
] 

Aaron Fabbri commented on HADOOP-13650:
---------------------------------------

Nice work on this [~eddyxu].  Overall I think it looks good.  Nice work on the test code too.

{code}
+   * @param create create the metadata store if not exists.
+   * @return a initialized metadata store.
+   */
+  MetadataStore initMetadataStore(boolean create) throws IOException {
{code}

Minor  comment clarification: "@param create When using DynamoDB, create table if it does
not exist"

{code}
+      }
+    } else {
+      // CLI does not specify metadata store URI, it uses default metadata store
+      // DynamoDB instead.
+      ms = new DynamoDBMetadataStore();
{code}

What if create == true here?

{code}
+  void initS3AFileSystem(String path) throws IOException {
+    URI uri;
{code}
Do we need to enforce that this FileSystem does *not* have a MetadataStore
configured?  (You want raw S3 access without the MetadataStore for this tool.)

{code}
+    private void importDir(S3AFileStatus status) throws IOException {
+      Preconditions.checkArgument(status.isDirectory());
+      RemoteIterator<LocatedFileStatus> it =
+          s3a.listFiles(status.getPath(), true);
+
+      while (it.hasNext()) {
+        LocatedFileStatus located = it.next();
+        S3AFileStatus child;
+        if (located.isDirectory()) {
+          // Because S3 does not actually store directory, this returned dir
+          // must be an empty directory.
{code}

I thought S3A's listFiles discovers non-empty directories via the prefixes returned from S3.
 In fs.s3a.Listing#buildNextStatusBatch().

{code}
+          final boolean isEmptyDir = true;
+          child = new S3AFileStatus(isEmptyDir, located.getPath(),
+              located.getOwner());
{code}

Should we add to dirCache here?

{code}
+    /**
+     * Print difference between two file statuses to the output stream.
+     *
+     * @param statusFromMS file status from metadata store.
+     * @param statusFromS3 file status from S3.
+     * @param out output stream.
+     */
+    private static void printDiff(S3AFileStatus statusFromMS,
{code}

To help future readers of the code, maybe say "Print difference, if any, ..."
in that function comment?

{code}
+    private static void printDiff(S3AFileStatus statusFromMS,
+                                  S3AFileStatus statusFromS3,
+                                  PrintStream out) {
+      Preconditions.checkArgument(
+          !(statusFromMS == null && statusFromS3 == null));
+      if (statusFromMS == null) {
+        out.printf("%s%s%s%n", S3_PREFIX, SEP, formatFileStatus(statusFromS3));
+      } else if (statusFromS3 == null) {
+        out.printf("%s%s%s%n", MS_PREFIX, SEP, formatFileStatus(statusFromMS));
+      }
+      // TODO: Do we need to compare the internal fields of two FileStatuses?
{code}
If so, modification times are likely to not match.  I think your code here is a good first
step and we could add FileStatus comparison in the future.


{code}
+      for (Path path : allPaths) {
+        S3AFileStatus s3status = s3Children.get(path);
+        S3AFileStatus msStatus = msChildren.get(path);
+        printDiff(msStatus, s3status, out);
+        if ((s3status != null && s3status.isDirectory()) ||
+            (msStatus != null && msStatus.isDirectory())) {
+          compareDir(msStatus, s3status, out);
+        }
+      }
{code}

This looks like it will work.. We could also just use two Sets, then, in pseudocode:
msSetCopy = copyOf(msSet)
print "In MetadataStore but not S3: ", msSetCopy.removeAll(s3aSet)
print "In S3 but not in MetadataStore: ", s3aSet.removeAll(msSet)

Your approach allows for adding comparison of the two FileStatus', in addition
to just checking Path set memberships, so it is actually more flexible. (So your code looks
good here)

Also in your Diff class:
{code}
+    public int run(String[] args, PrintStream out) throws IOException {
+      List<String> paths = parseArgs(args);
+      if (paths.isEmpty()) {
+        out.println(USAGE);
+        return INVALID_ARGUMENT;
+      }
+      String s3Path = paths.get(0);
+      // Make sure that S3AFileSystem does not hold an actual MetadataStore
+      // implementation.
+      Configuration conf = getConf();
+      conf.setClass(S3_METADATA_STORE_IMPL, NullMetadataStore.class,
+          MetadataStore.class);
{code}
Ahh. you enforce no MetadataStore here.. Should we move this up to initS3AFileSystem()?
Seems like all commands should have this?



> S3Guard: Provide command line tools to manipulate metadata store.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13650
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-13650-HADOOP-13345.000.patch, HADOOP-13650-HADOOP-13345.001.patch,
HADOOP-13650-HADOOP-13345.002.patch, HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch,
HADOOP-13650-HADOOP-13345.005.patch, HADOOP-13650-HADOOP-13345.006.patch, HADOOP-13650-HADOOP-13345.007.patch,
HADOOP-13650-HADOOP-13345.008.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata store, i.e.,
create or delete metadata store, or {{import}}, {{sync}} the file metadata between metadata
store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message