hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronen Itkin <ro...@taykey.com>
Subject Re: Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection
Date Fri, 16 Sep 2011 11:52:04 GMT
Thanks Gary!!


On Thu, Sep 15, 2011 at 10:34 PM, Gary Helmling <ghelmling@gmail.com> wrote:

> Running on EC2 has been discussed on the list quite a bit in the past, so
> you might want to do some searches on the archives.  Here are a few threads
> I pulled up:
>
> http://search-hadoop.com/m/paQmKTxSgj
>
> http://search-hadoop.com/m/7E9PaA6U1V
>
> http://search-hadoop.com/m/sGXTATdlIg2
>
> For instance types, it appears that only c1.xlarge, m2.4xlarge and
> cc1.xlarge instances will get you a physical server for each instance, so
> you will pay the least IO virtualization "tax" using these with instance
> storage.  But even with that expect reduced IO performance vs physical
> hardware.
>
> For the node layout, I'd suggest something like:
>
> 1 - NameNode, JobTracker, ZooKeeper, HMaster
> 1 - SecondaryNameNode, HMaster
> 3 - DataNode, TaskTracker, RegionServer
>
> You could run more ZK instances on smaller instance types (m1.medium?), but
> beware that these could be more subject to erratic IO throughput due to
> other instances running on the same physical server, which could negatively
> impact zookeeper performance and overall cluster stability.  So for a
> cluster this small, I don't think I would bother.
>
> For instance types, it'll depend on your workload and memory requirements.
> I usually use c1.xlarge for HBase testing, but those have somewhat limited
> memory, so you'll be constrained on the number of MR tasks you can run
> without overcommitting memory (you want to avoid swapping at all costs).
>
> I would say to do some testing with your workload and see what instance
> types give you the best performance at an acceptable price.
>
> --gh
>
>
> On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin <ronen@taykey.com> wrote:
>
> >  Hi,
> >
> > I am wondering if someone can recommend on the best practice with
> selecting
> > the right AMAZON EC2 instances combination for the following
> > implementation:
> >
> > Cloudera Hadoop HDFS and MapReduce:
> >
> >   - 1 NameNode + JobTracker servers.
> >   - 1 SecondaryNameNode server.
> >   - 3 DataNodes + TastTrackers.
> >
> >
> > Cloudera HBase:
> >
> >   - 2 HMaster servers
> >   - 3 ZooKeeper Servers
> >   - 2 Region Servers.
> >
> >
> > From your own experience what AMAZON EC2 instances should I choose?
> > How would you combine and place the above implementation across the
> > instances?
> > Should I place datanode & task tracker with HRegionServer on the same
> > instance?
> >
> > Thanks !
> >
> > --
> > *
> > Ronen.*
> >
> > <http://www.taykey.com/>
> >
>



-- 
*
Ronen Itkin*
Taykey | www.taykey.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message