hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.
Date Thu, 20 Feb 2014 23:53:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907731#comment-13907731

Chris Nauroth commented on HDFS-5957:

bq. Chris Nauroth: mmap() does take up physical memory, assuming those pages are mapped into
RAM and are not disk-resident.

Yes, most definitely.  I think Colin was trying to clarify that the initial mmap call dings
virtual memory: call mmap for a 1 MB file and you'll immediately see virtual memory increase
by 1 MB, but not physical memory.  Certainly as the pages get accessed and mapped in, we'll
start to consume physical memory.

bq. For small 200Gb data-sets (~1.4x tasks per container), ZCR does give a perf boost because
we get to use HADOOP-10047 instead of shuffling it between byte[] buffers for decompression.

Thanks, that clarifies why zero-copy read was still useful.

It sounds like you really do need a deterministic way to trigger the {{munmap}} calls, i.e.
LRU caching or no caching at all described above.

> Provide support for different mmap cache retention policies in ShortCircuitCache.
> ---------------------------------------------------------------------------------
>                 Key: HDFS-5957
>                 URL: https://issues.apache.org/jira/browse/HDFS-5957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Chris Nauroth
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by multiple reads
of the same block or by multiple threads.  The eventual {{munmap}} executes on a background
thread after an expiration period.  Some client usage patterns would prefer strict bounds
on this cache and deterministic cleanup by calling {{munmap}}.  This issue proposes additional
support for different caching policies that better fit these usage patterns.

This message was sent by Atlassian JIRA

View raw message