hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13403) AzureNativeFileSystem rename/delete performance improvements
Date Sat, 30 Jul 2016 08:03:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400530#comment-15400530
] 

Hadoop QA commented on HADOOP-13403:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 13s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 33s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 16s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 13s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 19s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 34s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 25s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 13s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 15s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 14s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 14s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 11s{color}
| {color:orange} hadoop-tools/hadoop-azure: The patch generated 2 new + 43 unchanged - 1 fixed
= 45 total (was 44) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 17s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m  9s{color}
| {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red}
The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>.
Refer https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 29s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 10s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 19s{color} | {color:green}
hadoop-azure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 15s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 22s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821152/HADOOP-13403-003.patch
|
| JIRA Issue | HADOOP-13403 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux 1d2583fec78d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d32bd8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/10128/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
|
| whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/10128/artifact/patchprocess/whitespace-eol.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/10128/testReport/ |
| modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure |
| Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/10128/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AzureNativeFileSystem rename/delete performance improvements
> ------------------------------------------------------------
>
>                 Key: HADOOP-13403
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13403
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: azure
>    Affects Versions: 2.7.2
>            Reporter: Subramanyam Pattipaka
>            Assignee: Subramanyam Pattipaka
>             Fix For: 2.9.0
>
>         Attachments: HADOOP-13403-001.patch, HADOOP-13403-002.patch, HADOOP-13403-003.patch
>
>
> WASB Performance Improvements
> Problem
> -----------
> Azure Native File system operations like rename/delete which has large number of directories
and/or files in the source directory are experiencing performance issues. Here are possible
reasons
> a)	We first list all files under source directory hierarchically. This is a serial operation.

> b)	After collecting the entire list of files under a folder, we delete or rename files
one by one serially.
> c)	There is no logging information available for these costly operations even in DEBUG
mode leading to difficulty in understanding wasb performance issues.
> Proposal
> -------------
> Step 1: Rename and delete operations will generate a list all files under the source
folder. We need to use azure flat listing option to get list with single request to azure
store. We have introduced config fs.azure.flatlist.enable to enable this option. The default
value is 'false' which means flat listing is disabled.
> Step 2: Create thread pool and threads dynamically based on user configuration. These
thread pools will be deleted after operation is over.  We are introducing introducing two
new configs
> 	a)	fs.azure.rename.threads : Config to set number of rename threads. Default value is
0 which means no threading.
> 	b)	fs.azure.delete.threads: Config to set number of delete threads. Default value is
0 which means no threading.
> 	We have provided debug log information on number of threads not used for the operation
which can be useful .
> 	Failure Scenarios:
> 	If we fail to create thread pool due to ANY reason (for example trying create with thread
count with large value such as 1000000), we fall back to serialization operation. 
> Step 3: Bob operations can be done in parallel using multiple threads executing following
snippet
> 	while ((currentIndex = fileIndex.getAndIncrement()) < files.length) {
> 		FileMetadata file = files[currentIndex];
> 		Rename/delete(file);
> 	}
> 	The above strategy depends on the fact that all files are stored in a final array and
each thread has to determine synchronized next index to do the job. The advantage of this
strategy is that even if user configures large number of unusable threads, we always ensure
that work doesn’t get serialized due to lagging threads. 
> 	We are logging following information which can be useful for tuning number of threads
> 	a) Number of unusable threads
> 	b) Time taken by each thread
> 	c) Number of files processed by each thread
> 	d) Total time taken for the operation
> 	Failure Scenarios:
> 	Failure to queue a thread execute request shouldn’t be an issue if we can ensure at
least one thread has completed execution successfully. If we couldn't schedule one thread
then we should take serialization path. Exceptions raised while executing threads are still
considered regular exceptions and returned to client as operation failed. Exceptions raised
while stopping threads and deleting thread pool shouldn't can be ignored if operation all
files are done with out any issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message