From common-issues-return-204759-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Tue Sep 1 03:42:02 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id C7B2F18063F for ; Tue, 1 Sep 2020 05:42:02 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 3E07863A33 for ; Tue, 1 Sep 2020 03:42:02 +0000 (UTC) Received: (qmail 24151 invoked by uid 500); 1 Sep 2020 03:42:00 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 24133 invoked by uid 99); 1 Sep 2020 03:41:59 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2020 03:41:59 +0000 From: =?utf-8?q?GitBox?= To: common-issues@hadoop.apache.org Subject: =?utf-8?q?=5BGitHub=5D_=5Bhadoop=5D_shanemkm_commented_on_a_change_in_pull_r?= =?utf-8?q?equest_=232246=3A_Hadoop-17215=2E_ABFS=3A_Disable_default_create_?= =?utf-8?q?overwrite?= Message-ID: <159893171941.32230.15808936712636867875.asfpy@gitbox.apache.org> Date: Tue, 01 Sep 2020 03:41:59 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit In-Reply-To: References: shanemkm commented on a change in pull request #2246: URL: https://github.com/apache/hadoop/pull/2246#discussion_r480718118 ########## File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java ########## @@ -271,10 +272,67 @@ public AbfsRestOperation deleteFilesystem() throws AzureBlobFileSystemException return op; } - public AbfsRestOperation createPath(final String path, final boolean isFile, final boolean overwrite, - final String permission, final String umask, - final boolean isAppendBlob) throws AzureBlobFileSystemException { + public AbfsRestOperation createPath(final String path, + final boolean isFile, + final boolean overwrite, + final String permission, + final String umask, + final boolean isAppendBlob) throws AzureBlobFileSystemException { + String operation = isFile + ? SASTokenProvider.CREATE_FILE_OPERATION + : SASTokenProvider.CREATE_DIRECTORY_OPERATION; + + // HDFS FS defaults overwrite behaviour to true for create file which leads + // to majority create API traffic with overwrite=true. In some cases, this + // will end in race conditions at backend with parallel operations issued to + // same path either by means of the customer workload or ABFS driver retry. + // Disabling the create overwrite default setting to false should + // significantly reduce the chances for such race conditions. + boolean isFirstAttemptToCreateWithoutOverwrite = false; + if (isFile && overwrite + && abfsConfiguration.isDefaultCreateOverwriteDisabled()) { + isFirstAttemptToCreateWithoutOverwrite = true; + } + + AbfsRestOperation op = null; + // Query builder + final AbfsUriQueryBuilder abfsUriQueryBuilder = createDefaultUriQueryBuilder(); + abfsUriQueryBuilder.addQuery(QUERY_PARAM_RESOURCE, + operation.equals(SASTokenProvider.CREATE_FILE_OPERATION) + ? FILE + : DIRECTORY); + if (isAppendBlob) { + abfsUriQueryBuilder.addQuery(QUERY_PARAM_BLOBTYPE, APPEND_BLOB_TYPE); + } + + appendSASTokenToQuery(path, operation, abfsUriQueryBuilder); + + try { + op = createPathImpl(path, abfsUriQueryBuilder, Review comment: In createPathImpl, it will go through retries with backoff inside of that, correct? Just want to make sure since our observation was that those retries almost always succeed, hence why this change would remove the race condition/stale write requests almost entirely ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org