hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.
Date Thu, 20 Feb 2014 23:33:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907700#comment-13907700

Gopal V commented on HDFS-5957:

[~cnauroth]: mmap() does take up physical memory, assuming those pages are mapped into RAM
and are not disk-resident.

As long as we're on Linux, it will show up in RSS as well as marked in the Shared_Clean/Referenced
field in /proc/<pid>/smaps.

YARN could do a better job of calculating "How much memory will be free'd up if this process
is killed" vs "How much memory does this process use". But that is a completely different

When I set the mmap timeout to 1000ms, some of my queries succeeded - mostly the queries which
were taking > 50 seconds. 

But the really fast ORC queries which take ~10 seconds to run still managed to hit around
~50x task failures out of ~3000 map tasks.

The perf dip happens because some of the failures. 

For small 200Gb data-sets (~1.4x tasks per container), ZCR does give a perf boost because
we get to use HADOOP-10047 instead of shuffling it between byte[] buffers for decompression.

> Provide support for different mmap cache retention policies in ShortCircuitCache.
> ---------------------------------------------------------------------------------
>                 Key: HDFS-5957
>                 URL: https://issues.apache.org/jira/browse/HDFS-5957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Chris Nauroth
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by multiple reads
of the same block or by multiple threads.  The eventual {{munmap}} executes on a background
thread after an expiration period.  Some client usage patterns would prefer strict bounds
on this cache and deterministic cleanup by calling {{munmap}}.  This issue proposes additional
support for different caching policies that better fit these usage patterns.

This message was sent by Atlassian JIRA

View raw message