hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: ProcFsBasedProcessTree and clean pages in smaps
Date Thu, 04 Feb 2016 18:20:21 GMT
Hello Jan,

I am moving this thread from user@hadoop.apache.org to
yarn-dev@hadoop.apache.org, since it's less a question of general usage
and more a question of internal code implementation details and possible
enhancements.

I think the issue is that it's not guaranteed in the general case that
Private_Clean pages are easily evictable from page cache by the kernel.
For example, if the pages have been pinned into RAM by calling mlock [1],
then the kernel cannot evict them.  Since YARN can execute any code
submitted by an application, including possibly code that calls mlock, it
takes a cautious approach and assumes that these pages must be counted
towards the process footprint.  Although your Spark use case won't mlock
the pages (I assume), YARN doesn't have a way to identify this.

Perhaps there is room for improvement here.  If there is a reliable way to
count only mlock'ed pages, then perhaps that behavior could be added as
another option in ProcfsBasedProcessTree.  Off the top of my head, I can't
think of a reliable way to do this, and I can't research it further
immediately.  Do others on the thread have ideas?

--Chris Nauroth

[1] http://linux.die.net/man/2/mlock




On 2/4/16, 5:11 AM, "Jan Lukavsk√Ĺ" <jan.lukavsky@firma.seznam.cz> wrote:

>Hello,
>
>I have a question about the way LinuxResourceCalculatorPlugin calculates
>memory consumed by process tree (it is calculated via
>ProcfsBasedProcessTree class). When we enable caching (disk) in apache
>spark jobs run on YARN cluster, the node manager starts to kill the
>containers while reading the cached data, because of "Container is
>running beyond memory limits ...". The reason is that even if we enable
>parsing of the smaps file
>(yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled)
>the ProcfsBasedProcessTree calculates mmaped read-only pages as consumed
>by the process tree, while spark uses FileChannel.map(MapMode.READ_ONLY)
>to read the cached data. The JVM then consumes *a lot* more memory than
>the configured heap size (and it cannot be really controlled), but this
>memory is IMO not really consumed by the process, the kernel can reclaim
>these pages, if needed. My question is - is there any explicit reason
>why "Private_Clean" pages are calculated as consumed by process tree? I
>patched the ProcfsBasedProcessTree not to calculate them, but I don't
>know if this is the "correct" solution.
>
>Thanks for opinions,
>  cheers,
>  Jan
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: user-help@hadoop.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message