Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C0878200B6B for ; Thu, 11 Aug 2016 00:28:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BF310160AA4; Wed, 10 Aug 2016 22:28:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 109CD160AB1 for ; Thu, 11 Aug 2016 00:28:21 +0200 (CEST) Received: (qmail 92276 invoked by uid 500); 10 Aug 2016 22:28:21 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 92265 invoked by uid 99); 10 Aug 2016 22:28:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Aug 2016 22:28:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A3D302C02A9 for ; Wed, 10 Aug 2016 22:28:20 +0000 (UTC) Date: Wed, 10 Aug 2016 22:28:20 +0000 (UTC) From: "Chris Nauroth (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-13448) S3Guard: Define MetadataStore interface. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 10 Aug 2016 22:28:22 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416136#comment-15416136 ] Chris Nauroth commented on HADOOP-13448: ---------------------------------------- bq. Why does your {{DynamoDBConsistentStore#save()}} implementation walk the path to the root and save all ancestor paths as well? That's a good observation. I think this is a weakness of my prototype, not a desirable choice intended to carry through to the full implementation. More specifically, I approached my prototype by developing a separate hadoop-s3guard module with a new {{ConsistentS3AFileSystem}} class defined as a subclass of the existing {{S3AFileSystem}} class. The benefit of this approach was that I didn't need to make a lot of code changes directly in hadoop-aws, so I could develop the prototype isolated from the churn of merge conflicts on upstream hadoop-aws patches. (There was a lot of optimization and bug fixing happening concurrently at the time.) The drawback of this approach was that it constrained my implementation. For {{mkdirs}}, I could only call the superclass and then pass the path to {{ConsistentStore#save}}, so the consistent store code needed a complete implementation using solely that path argument. There was no way for me to preserve the information discovered in {{S3AFileSystem#innerMkdirs}} about which intermediate directories were missing, as was done in your prototype. I came to the conclusion that the subclassing approach wouldn't be ideal for reasons like this. We can get better results by hooking into implementation details more deeply, and that led me to the refactoring proposed on HADOOP-13447. Between {{S3Store}}, {{AbstractS3AccessPolicy}} and the {{MetadataStore}} interface, we should feel free to evolve those interfaces however it best suits requirements. They are internal interfaces, so they don't need to be constrained by the Hadoop compatibility guidelines, as long as {{S3AFileSystem}} can translate back to the public {{FileSystem}} interface at the end. In the example you gave here, maybe that means something like {{S3Store#mkdirs}} returning a result object that lists which directories in the ancestry were not pre-existing. Another smaller reason my prototype worked that way is that it was also easy to hook a call to {{ConsistentStore#save}} onto the close of the stream returned by {{FileSystem#create}}. Unlike {{mkdirs}}, there is no such walk up the ancestry to check for pre-existing directories there, so I had to take care of it entirely within my code. This is really more of a bug in the existing S3A code though that I was working around. (See HADOOP-13221.) > S3Guard: Define MetadataStore interface. > ---------------------------------------- > > Key: HADOOP-13448 > URL: https://issues.apache.org/jira/browse/HADOOP-13448 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > > Define the common interface for metadata store operations. This is the interface that any metadata back-end must implement in order to integrate with S3Guard. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org