hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: Several questions about running HBase on EC2
Date Wed, 21 Apr 2010 06:16:41 GMT
Some quick thoughts (from a relative newbie):

- if disk space is a problem you could mount some EBS volumes
- building an AMI is really very easy:
http://docs.amazonwebservices.com/AmazonEC2/dg/2006-06-26/creating-an-ami.html
  - why would you not use a cloudera image (or RPM) though and benefit
from the work of others (testing, future upgrades, documentation,
potential support etc)
- if it were me, I would pick the EC2 size with equivalent memory that
you plan to purchase in your machines to try the tuning options to be
used for real
- EC2 seems pretty slow on disk IO.  My Hadoop cluster runs about 4x
the EC2 speed and is made of Dell R200s with 2x500G SATA, and 8GB
- for testing, perhaps start with saturating HBase with huge traffic
simulating your app (e.g. read/write) and see how gracefully both
HBase and the clients handle it, and then also under normal load,
start dropping servers from the HBase cluster (including ZK) to see
simulated failure (and then coming back up again).



On Wed, Apr 21, 2010 at 7:58 AM, Sean <seanatpurdue@hotmail.com> wrote:
>
> Hi folks,
> I am thinking of building a testing environment for a HBase cluster on EC2, and I plan
to build such an environment for the following reasons:
> 1) To have a reference throughput/read_latency number for different size of HBase cluster.2)
To test various schema design and its performance implication to scan and M/R operation.
> -- After having result from 1 and 2, we can decide how to build actual physical cluster.
The reason that we don't want to build physical cluster at the first place is because I understand
that building a 4 nodes cluster does not make too much sense for real load test (we do have
a rough estimation of how big our data size will be).-- At the same time, I hope I can have
got enough high-availability solution during our experimenting on 1 and 2.
> Having said my motivation of this experiment, I'd like ask several questions:
> a) After reading http://aws.amazon.com/ec2/instance-types/, I believe I should select
"Standard Instances: Extra Large Instance" as my instance. Though it seems that I should pick
"High-Memory Instances" family because we are talking about memory hungry application here,
"High-Memory Instances" probably does not fit my testing environment -- the disk space does
not look like a good number. Note: after the testing at this environment, I will need to use
the benchmark number as a reference to build my actual cluster.
>
> b) I understand Cloudera provides an AMI, but can I build my own? If I can choose to
do so, can someone give me a pointer? I have successfully built an HBase server on a 4 machine
cluster, how much further effort (please give me an estimate if you would) need I put to achieve
this goal?
> c) Here is my testing environment:   -- I build an HBase cluster for serving   -- then
I build several clients for issuing work-load opsHow can I get to learn the high-availability
lessons around this (I know most of the high-level ideas, but all subtle issues come from
implementation details as we all know, especially for a distributed system)
>
>
> Thanks for any suggestion!
>
>
>
>
> _________________________________________________________________
> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Mime
View raw message