hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.
Date Sat, 15 Feb 2014 22:09:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902547#comment-13902547
] 

Chris Nauroth commented on HDFS-5957:
-------------------------------------

Here are some additional details on the scenario that prompted filing this issue.  Thanks
to [~gopalv] for sharing the details.

Gopal has a YARN application that performs strictly sequential reads of HDFS files.  The application
may rapidly iterate through a large number of blocks.  The reason for this is that each block
contains a small metadata header, and based on the contents of this metadata, the application
often can decide that there is nothing relevant in the rest of the block.  If that happens,
then the application seeks all the way past that block.  Gopal estimates that it's feasible
this code would scan through ~100 HDFS blocks in ~10 seconds.

This usage pattern in combination with zero-copy read causes retention of a large number of
memory-mapped regions in the {{ShortCircuitCache}}.  Eventually, YARN's resource check kills
the container process for exceeding the enforced physical memory bounds.  The asynchronous
nature of our {{munmap}} calls was surprising for Gopal, who had carefully calculated his
memory usage to stay under YARN's resource checks.

As a workaround, I advised Gopal to downtune {{dfs.client.mmap.cache.timeout.ms}} to make
the {{munmap}} happen more quickly.  A better solution would be to provide support in the
HDFS client for a caching policy that fits this usage pattern.  Two possibilities are:

# LRU bounded by a client-specified maximum memory size.  (Note this is maximum memory size
and not number of files or number of blocks, because of the possibility of differing block
counts and block sizes.)
# Do not cache at all.  Effectively, there is only one memory-mapped region alive at a time.
 The sequential read usage pattern described above always results in a cache miss anyway,
so a cache adds no value.

I don't propose removing the current time-triggered threshold, because I think that's valid
for other use cases.  I only propose adding support for new policies.

In addition to the caching policy itself, I want to propose a way to move the {{munmap}} calls
to run synchronous with the caller instead of in a background thread.  This would be a better
fit for clients who want deterministic resource cleanup.  Right now, we have no way to guarantee
that the OS will schedule the {{CacheCleaner}} thread ahead of YARN's resource check thread.
 This isn't a proposal to remove support for the background thread, only to add support for
synchronous {{munmap}}.

I think you could also make an argument that YARN shouldn't count these memory-mapped regions
towards the container process's RSS.  It's really the DataNode process that owns that memory,
and clients who {{mmap}} the same region shouldn't get penalized.  Let's address that part
separately though.


> Provide support for different mmap cache retention policies in ShortCircuitCache.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-5957
>                 URL: https://issues.apache.org/jira/browse/HDFS-5957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by multiple reads
of the same block or by multiple threads.  The eventual {{munmap}} executes on a background
thread after an expiration period.  Some client usage patterns would prefer strict bounds
on this cache and deterministic cleanup by calling {{munmap}}.  This issue proposes additional
support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message