Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates
 209.85.210.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CANDKG_xDjrRy7dMNhNO7RVxuixgzQJ_or2fR_tMVHudvvGWS3A@mail.gmail.com>
References: 
 <CANDKG_xDjrRy7dMNhNO7RVxuixgzQJ_or2fR_tMVHudvvGWS3A@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Sat, 23 Mar 2013 10:44:57 +0530
Message-ID: 
 <CAOcnVr2Pfisb4daNp1g4OFnTGief-5JKLxzSAn3dxYqjYAKGdQ@mail.gmail.com>
Subject: Re: how to control (or understand) the memory usage in hdfs
To: "<user@hadoop.apache.org>" <user@hadoop.apache.org>
Content-Type: text/plain; charset=ISO-8859-1

I run a 128 MB heap size DN for my simple purposes on my Mac and it
runs well for what load I apply on it.

A DN's primary, growing memory consumption comes from the # of blocks
it carries. All of these blocks' file paths are mapped and kept in the
RAM during its lifetime. If your DN has acquired a lot of blocks by
now, like say close to a million or more, then 1 GB may not suffice
anymore to hold them in and you'd need to scale up (add more RAM or
increase heap size if you have more RAM)/scale out (add another node
and run the balancer).

On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6squeegee@gmail.com> wrote:
> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
> machines in a single node setup. I'm encountering out of memory errors
> on the jvm running my data node.
>
> I'm pretty sure I can just increase the heap size to fix the errors,
> but my question is about how memory is actually used.
>
> As an example, with other things like an OS's disk-cache or say
> databases, if you have or let it use as an example 1gb of ram, it will
> "work" with what it has available, if the data is more than 1gb of ram
> it just means it'll swap in and out of memory/disk more often, i.e.
> the cached data is smaller. If you give it 8gb of ram it still
> functions the same, just performance increases.
>
> With my hdfs setup, this does not appear to be true, if I allocate it
> 1gb of heap, it doesn't just perform worst / swap data to disk more.
> It out right fails with out of memory and shuts the data node down.
>
> So my question is... how do I really tune the memory / decide how much
> memory I need to prevent shutdowns? Is 1gb just too small even on a
> single machine test environment with almost no data at all, or is it
> suppose to work like OS-disk caches were it always works but just
> performs better or worst and I just have something configured wrong?.
> Basically my objective isn't performance, it's that the server must
> not shut itself down, it can slow down but not shut off.
>
> --
> Ted.


-- 
Harsh J