Yes but even with a MR running, it is only 36GB heap total out of 64GB ram.á This leaves plenty for OS and caching.

The problem seems to be the OS preferring to cache over giving space to the applications.á Once I drop the caches and rerun the MR job again several times, it runs perfectly fine.

On Dec 8, 2012 7:06 PM, "Marcos Ortiz" <mlortiz@uci.cu> wrote:
Are you sure that 24 map slots is a good number for this machine?
Remember that you have three services (DN, TT and HRegionServer) with
with a 12 GB for Heap.
Try to use a lower number of map slots (12 for example) and launch your
MR job again.
Can you share your logs in pastebin?


On Sat 08 Dec 2012 07:09:02 PM CST, Robert Dyer wrote:
Has anyone experienced a TaskTracker/DataNode behaving like the
attached image?

This was during a MR job (which runs often). áNote the extremely high
System CPU time. áUpon investigating I saw that out of 64GB ram the
system had allocated almost 45GB to cache!

I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync"
which is roughly where the graph goes back to normal (much lower
System, much higher User).

This has happened a few times.

I have tried playing with the sysctl vm.swappiness value (default of
60) by setting it to 30 (which it was at when the graph was collected)
and now to 10. áI am not sure that helps.

Any ideas? áAnyone else run into this before?

24 cores
64GB ram
4x2TB sata3 hdd

Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb
heap) on this machine.

24 map slots (1gb heap each), no reducers.

Also running HBase 0.94.2 with a RS (8gb ram) on this machine.

--
Marcos Luis OrtÝz Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci