Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 79D57200D09 for ; Tue, 12 Sep 2017 15:00:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 789E41609C8; Tue, 12 Sep 2017 13:00:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BD7E41609C6 for ; Tue, 12 Sep 2017 15:00:11 +0200 (CEST) Received: (qmail 80003 invoked by uid 500); 12 Sep 2017 13:00:09 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 79992 invoked by uid 99); 12 Sep 2017 13:00:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Sep 2017 13:00:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B845518C6A8 for ; Tue, 12 Sep 2017 13:00:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id AFidk53a8yBE for ; Tue, 12 Sep 2017 13:00:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 20BF661029 for ; Tue, 12 Sep 2017 13:00:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 690EFE010F for ; Tue, 12 Sep 2017 13:00:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2631C24137 for ; Tue, 12 Sep 2017 13:00:05 +0000 (UTC) Date: Tue, 12 Sep 2017 13:00:05 +0000 (UTC) From: "Steve Loughran (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 12 Sep 2017 13:00:12 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162889#comment-16162889 ] Steve Loughran commented on HADOOP-13761: ----------------------------------------- Note that the DDB throttled EX is 400; you can't use RESTy error code logic here, Also, in {{waitForTableActive()}} failures are remapped to Illegal Argument Exceptions. > S3Guard: implement retries for DDB failures and throttling > ---------------------------------------------------------- > > Key: HADOOP-13761 > URL: https://issues.apache.org/jira/browse/HADOOP-13761 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1 > Reporter: Aaron Fabbri > Assignee: Steve Loughran > > Following the S3AFileSystem integration patch in HADOOP-13651, we need to add retry logic. > In HADOOP-13651, I added TODO comments in most of the places retry loops are needed, including: > - open(path). If MetadataStore reflects recent create/move of file path, but we fail to read it from S3, retry. > - delete(path). If deleteObject() on S3 fails, but MetadataStore shows the file exists, retry. > - rename(src,dest). If source path is not visible in S3 yet, retry. > - listFiles(). Skip for now. Not currently implemented in S3Guard. I will create a separate JIRA for this as it will likely require interface changes (i.e. prefix or subtree scan). > We may miss some cases initially and we should do failure injection testing to make sure we're covered. Failure injection tests can be a separate JIRA to make this easier to review. > We also need basic configuration parameters around retry policy. There should be a way to specify maximum retry duration, as some applications would prefer to receive an error eventually, than waiting indefinitely. We should also be keeping statistics when inconsistency is detected and we enter a retry loop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org