hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10048) LocalDirAllocator should avoid holding locks while accessing the filesystem
Date Mon, 14 Mar 2016 06:28:33 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192814#comment-15192814
] 

Yi Li commented on HADOOP-10048:
--------------------------------

Hi all, we have integrated this patch into our internal build (based on hadoop-2.5-cdh5.3.2)
and deployed to two clusters (tens/hundreds of nodes each) for more than a month.So far there's
no abnormal behavior, no job will be stuck in SHUFFLE phase (back then a single fetch attempt
could hang for several hours under heavy load), and we have observed that the throughput of
ShuffleHandler can be 10x higher than before (based on ShuffleMetrics). 
Is the patch good to go? Thanks.

> LocalDirAllocator should avoid holding locks while accessing the filesystem
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-10048
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10048
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-10048.patch
>
>
> As noted in MAPREDUCE-5584 and HADOOP-7016, LocalDirAllocator can be a bottleneck for
multithreaded setups like the ShuffleHandler.  We should consider moving to a lockless design
or minimizing the critical sections to a very small amount of time that does not involve I/O
operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message