hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
Date Wed, 08 Sep 2010 17:46:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907334#action_12907334

Arun C Murthy commented on MAPREDUCE-1904:

Couple of concerns:

# I'd like to understand what part of LocalDirAllocator.getLocalPathToRead is expensive...
it's fine to add a cache, but it's better to do it _after_ we understand why we really need
# This patch results in the code path skipping the sanity checks in LocalDirAllocator.confChanged
which is called by LocalDirAllocator.getLocalPathToRead. That is a concern. Again, this might
be the expensive part of LocalDirAllocator.getLocalPathToRead, but we need to ensure that.

Don't get me wrong, the focus of this jira is very useful - we just need to fix it the 'right'

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, profiler
output after applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread profiler
output showing contention.jpg
> While profiling tasktracker with Sort benchmark, it was observed that threads block on
LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available
per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index
file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with
key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on
the environment and I observed a throughput improvement in the order of 4-7% with the introduction
of LRUCache.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message