Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C0A73200B64 for ; Tue, 2 Aug 2016 10:43:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BF4A0160A8C; Tue, 2 Aug 2016 08:43:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DE7C8160A76 for ; Tue, 2 Aug 2016 10:43:21 +0200 (CEST) Received: (qmail 54589 invoked by uid 500); 2 Aug 2016 08:43:21 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 54577 invoked by uid 99); 2 Aug 2016 08:43:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2016 08:43:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D00B32C029E for ; Tue, 2 Aug 2016 08:43:20 +0000 (UTC) Date: Tue, 2 Aug 2016 08:43:20 +0000 (UTC) From: "Hadoop QA (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-13403) AzureNativeFileSystem rename/delete performance improvements MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 02 Aug 2016 08:43:22 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13403?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15= 403632#comment-15403632 ]=20 Hadoop QA commented on HADOOP-13403: ------------------------------------ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s= {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0= m 0s{color} | {color:green} The patch does not contain any @author tags. {= color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green}= 0m 0s{color} | {color:green} The patch appears to include 2 new or modif= ied test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}= 7m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0= m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}= 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0= m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}= 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} = 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0= m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}= 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0= m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m = 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}= 0m 10s{color} | {color:green} hadoop-tools/hadoop-azure: The patch genera= ted 0 new + 43 unchanged - 1 fixed =3D 43 total (was 44) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0= m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}= 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green}= 0m 0s{color} | {color:green} The patch has no whitespace issues. {color}= | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} = 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0= m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1= 8s{color} | {color:green} hadoop-azure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green}= 0m 16s{color} | {color:green} The patch does not generate ASF License war= nings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 50s{colo= r} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/1282154= 3/HADOOP-13403-004.patch | | JIRA Issue | HADOOP-13403 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsit= e unit findbugs checkstyle | | uname | Linux e987e0228780 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT We= d Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provide= d.sh | | git revision | trunk / a5fb298 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/1015= 0/testReport/ | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/101= 50/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AzureNativeFileSystem rename/delete performance improvements > ------------------------------------------------------------ > > Key: HADOOP-13403 > URL: https://issues.apache.org/jira/browse/HADOOP-13403 > Project: Hadoop Common > Issue Type: Bug > Components: azure > Affects Versions: 2.7.2 > Reporter: Subramanyam Pattipaka > Assignee: Subramanyam Pattipaka > Fix For: 2.9.0 > > Attachments: HADOOP-13403-001.patch, HADOOP-13403-002.patch, HADO= OP-13403-003.patch, HADOOP-13403-004.patch > > > WASB Performance Improvements > Problem > ----------- > Azure Native File system operations like rename/delete which has large nu= mber of directories and/or files in the source directory are experiencing p= erformance issues. Here are possible reasons > a)=09We first list all files under source directory hierarchically. This = is a serial operation.=20 > b)=09After collecting the entire list of files under a folder, we delete = or rename files one by one serially. > c)=09There is no logging information available for these costly operation= s even in DEBUG mode leading to difficulty in understanding wasb performanc= e issues. > Proposal > ------------- > Step 1: Rename and delete operations will generate a list all files under= the source folder. We need to use azure flat listing option to get list wi= th single request to azure store. We have introduced config fs.azure.flatli= st.enable to enable this option. The default value is 'false' which means f= lat listing is disabled. > Step 2: Create thread pool and threads dynamically based on user configur= ation. These thread pools will be deleted after operation is over. We are = introducing introducing two new configs > =09a)=09fs.azure.rename.threads : Config to set number of rename threads.= Default value is 0 which means no threading. > =09b)=09fs.azure.delete.threads: Config to set number of delete threads. = Default value is 0 which means no threading. > =09We have provided debug log information on number of threads not used f= or the operation which can be useful . > =09Failure Scenarios: > =09If we fail to create thread pool due to ANY reason (for example trying= create with thread count with large value such as 1000000), we fall back t= o serialization operation.=20 > Step 3: Bob operations can be done in parallel using multiple threads exe= cuting following snippet > =09while ((currentIndex =3D fileIndex.getAndIncrement()) < files.length) = { > =09=09FileMetadata file =3D files[currentIndex]; > =09=09Rename/delete(file); > =09} > =09The above strategy depends on the fact that all files are stored in a = final array and each thread has to determine synchronized next index to do = the job. The advantage of this strategy is that even if user configures lar= ge number of unusable threads, we always ensure that work doesn=E2=80=99t g= et serialized due to lagging threads.=20 > =09We are logging following information which can be useful for tuning nu= mber of threads > =09a) Number of unusable threads > =09b) Time taken by each thread > =09c) Number of files processed by each thread > =09d) Total time taken for the operation > =09Failure Scenarios: > =09Failure to queue a thread execute request shouldn=E2=80=99t be an issu= e if we can ensure at least one thread has completed execution successfully= . If we couldn't schedule one thread then we should take serialization path= . Exceptions raised while executing threads are still considered regular ex= ceptions and returned to client as operation failed. Exceptions raised whil= e stopping threads and deleting thread pool shouldn't can be ignored if ope= ration all files are done with out any issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org