Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9D630200D34 for ; Thu, 19 Oct 2017 19:55:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9BDF8160BED; Thu, 19 Oct 2017 17:55:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E3E911609EE for ; Thu, 19 Oct 2017 19:55:06 +0200 (CEST) Received: (qmail 323 invoked by uid 500); 19 Oct 2017 17:55:05 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 304 invoked by uid 99); 19 Oct 2017 17:55:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Oct 2017 17:55:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 179E6C0C5B for ; Thu, 19 Oct 2017 17:55:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id P0_f0gkWIt3Y for ; Thu, 19 Oct 2017 17:55:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1707861103 for ; Thu, 19 Oct 2017 17:55:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 50738E0D63 for ; Thu, 19 Oct 2017 17:55:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9DD36243AA for ; Thu, 19 Oct 2017 17:55:00 +0000 (UTC) Date: Thu, 19 Oct 2017 17:55:00 +0000 (UTC) From: "Aaron Fabbri (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-14098) AliyunOSS: improve the performance of object metadata operation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 19 Oct 2017 17:55:07 -0000 [ https://issues.apache.org/jira/browse/HADOOP-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211442#comment-16211442 ] Aaron Fabbri commented on HADOOP-14098: --------------------------------------- I made efforts to keep the MetadataStore part of S3Guard a separate layer that other filesystems could use. In S3Guard, we use a MetadataStore as a trailing log of metadata edits used to guard against list/stat inconsistency. The MetadataStore interface is also designed to be used as a limited-lifetime cache of FileStatus objects which is demand-loaded and does not need to contain metadata for all files in the underlying FS. (e.g. implementations may only cache recently-seen entries). Some pointers to get started: 1. The [MetadataStore|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java] interface. Note there are zero imports of S3A code there. It does live in hadoop-tools/hadoop-aws, but I expected us to move it to a common place as soon as another FileSystem uses it. (Note, there used to be S3A specific code for empty directory behavior, but those have been fixed and removed from MetadataStore layer). 2. A local (in-memory) implementation of the interface is [LocalMetadataStore|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java]. This is not for production use at this time: The goal is to have an easy to run reference implementation for tests. It is not a perfect implementation but it is small (<500 LOC) and supports authoritative directory listing bit (which the Dynamo implementation does not yet support). You could use this as a test implementation for integrating with a FileSystem. 3. There is a nice set of contract tests that validate that an implementation (i.e. different back end) of MetadataStore works correctly. The base test class is [MetadataStoreTestBase|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java]. If you wanted to develop a new back end you could essentially use this for test-driven development. I agree with Steve L that this is a big amount of work. A major caveat of the approach is the lack of transactions around updating the two sources of truth (the FS and the MetadataStore). This means things can get out of sync when failures happen. The cost of transactions is prohibitive with the backend we use (Dynamo), so our strategy in dealing with this is (1) use soft state (entries in MetadataStore are expired via a prune CLI command that is scheduled). (2) Have good CLI tools for detecting and fixing any inconsistencies. Another feature of the design is that it is always safe to delete all the data in the MetadataStore.. Think of it as a cache flush that can be used to clear inconsistencies in the worst case. (Also: Deleting *some* of the data may or may not be safe depending on the implementation of the MetadataStore). > AliyunOSS: improve the performance of object metadata operation > --------------------------------------------------------------- > > Key: HADOOP-14098 > URL: https://issues.apache.org/jira/browse/HADOOP-14098 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 3.0.0-alpha2 > Reporter: Genmao Yu > Assignee: Genmao Yu > > Open this JIRA to research and address the potential request performance issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org