accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <john.w.vi...@ugov.gov>
Subject Re: memory usage & process distribution
Date Mon, 23 Jul 2012 15:33:47 GMT
On Mon, Jul 23, 2012 at 11:21 AM, Miguel Pereira
<miguelapereira1@gmail.com>wrote:

> Hey guys,
>
> I want to set up a realistic production cluster on Amazon's EC2 and I am
> trying to decide 2 things.
>
>
>    -  Memory usage
>
> If I use one of the example configuration files, say the 512MB does that
> mean that all Accumulo processes will use up a total of 512MB? At least
> this appears to be the case when looking at the accumulo-env.sh
> This will determine weather I use a small or large instance.
>
>
>
Yes, it sets it up so all of the Accumulo processes have a footprint no
bigger than 512MB. Mind you, we only have one configuration that is set up
for things in a distributed fashion, which is 3GB. So if you're running
multiple nodes, you can up some of the configurations for a larger
footprint because you won't be running every process on every node.


>    - Process Distribution
>
> Is this a standard configuration? I will start off with a small # of worker
> nodes ( 3-4 ) & hope to use my local machine as a "monitor" for the
> accumulo & ganglia web UI's in order to avoid ssh -X latency.
>
> [ Name Node ] Name Node, Gmond
> [ Secondary NN ] Secondary Name Node, Gmond
> [ Job Tracker ] JobTracker, Gmond
> [ Zookeeper ] Zookeeper
> [ Accumulo Master ] Master, Tracer, Garbage Collector, Gmond, Jmxtrans
> [ Monitor ] Monitor, Gmetad, Gweb
> [ Worker Node ] DataNode, Tasktracker, TabletServer, Logger, Gmond,
> Jmxtrans
>
> That looks good to me. Just make sure you configure your map reduce to
that child memory * (reduce slots + map slots) aren't enough to cause
swapping.

>
> Thanks,
>
> Miguel
>

John

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message