hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: how to control (or understand) the memory usage in hdfs
Date Sat, 23 Mar 2013 05:14:57 GMT
I run a 128 MB heap size DN for my simple purposes on my Mac and it
runs well for what load I apply on it.

A DN's primary, growing memory consumption comes from the # of blocks
it carries. All of these blocks' file paths are mapped and kept in the
RAM during its lifetime. If your DN has acquired a lot of blocks by
now, like say close to a million or more, then 1 GB may not suffice
anymore to hold them in and you'd need to scale up (add more RAM or
increase heap size if you have more RAM)/scale out (add another node
and run the balancer).

On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6squeegee@gmail.com> wrote:
> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
> machines in a single node setup. I'm encountering out of memory errors
> on the jvm running my data node.
> I'm pretty sure I can just increase the heap size to fix the errors,
> but my question is about how memory is actually used.
> As an example, with other things like an OS's disk-cache or say
> databases, if you have or let it use as an example 1gb of ram, it will
> "work" with what it has available, if the data is more than 1gb of ram
> it just means it'll swap in and out of memory/disk more often, i.e.
> the cached data is smaller. If you give it 8gb of ram it still
> functions the same, just performance increases.
> With my hdfs setup, this does not appear to be true, if I allocate it
> 1gb of heap, it doesn't just perform worst / swap data to disk more.
> It out right fails with out of memory and shuts the data node down.
> So my question is... how do I really tune the memory / decide how much
> memory I need to prevent shutdowns? Is 1gb just too small even on a
> single machine test environment with almost no data at all, or is it
> suppose to work like OS-disk caches were it always works but just
> performs better or worst and I just have something configured wrong?.
> Basically my objective isn't performance, it's that the server must
> not shut itself down, it can slow down but not shut off.
> --
> Ted.

Harsh J

View raw message