hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject RE: cost estimation
Date Fri, 11 Mar 2011 01:18:00 GMT
Hi Peter,

We boot the master first, then boot the slaves after the master's IP address is known. 

Instances are initialized using user-data scripts. 

We do substitutions on config details when creating the user-data for the instances.

So this is sufficient for transient/testing clusters. For a cluster that would run for a long
time or need to be reliable, you'd want to have a plan for if the master instance goes away.
I think what would be relatively easy to do is grab an elastic IP (which gives you a "well
known" DNS name also), assign it to the current master, then use RedHat Cluster Suite or similar
with another instance as a hot spare, with DRDB replication of the fsimage from primary to
secondary. Then the script which handles loss of the primary can remap the elastic IP and
start a namenode on the secondary.

Best regards,

    - Andy

> From: Peter Haidinyak <phaidinyak@local.com>
> Subject: RE: cost estimation
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Date: Thursday, March 10, 2011, 3:46 PM
> I just took a day course on the
> Amazon Cloud and he had mentioned the every time you spin up
> a VM it gets a different IP and Host name. If this is true
> how do you keep the configuration files current every time
> you add a new VM or power on an existing Cluster?
> 
> Thanks
> 
> -Pete
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> 
> Sent: Thursday, March 10, 2011 3:31 PM
> To: user@hbase.apache.org
> Subject: Re: cost estimation
> 
> Everything Gary said.
> 
> Something interesting Netflix said this week at the ccevent
> conference was they were able to depreciate Reserved
> Instance payments as a capital expenditure.
> 
> Also, c1.xlarge is one of only three instance types that
> seem to get its own physical server for each instance
> (others are m2.4xlarge and cc1.xlarge iirc). 
> 
> > From: Gary Helmling <ghelmling@gmail.com>
> > Subject: Re: cost estimation
> > To: user@hbase.apache.org
> > Date: Thursday, March 10, 2011, 9:37 AM
> > Hi Weishung,
> > 
> > See the EC2 instance pricing details here:
> > http://aws.amazon.com/ec2/#pricing
> > 
> > <http://aws.amazon.com/ec2/#pricing>and
> > try to calculate it out vs. price
> > quotes for hardware.
> > 
> > You'll need to run at _least_ m1.large or c1.xlarge
> instances for HBase.
> >  There was a recent discussion thread covering
> EC2 performance.  You can
> > look it up at search-hadoop.com.
> > 
> > If you don't need the cluster running 24x7, maybe you
> can make the EC2
> > pricing work out.  Just be aware that you'll be
> taking a hit in raw IO
> > performance per node, so you may need to balance that
> out with more nodes
> > than you would need with using your own hardware.  If
> you need to persist
> > data between cluster restarts, you'll also need either
> EBS or S3 storage, so
> > be sure to factor that in.  Also factor in bandwidth
> costs if you need to
> > transfer a lot of data in/out of AWS.
> > 
> > My own impression is that EC2 is great and very cost
> effective for short
> > lived, on-demand computing resources.  We use it a
> great deal for functional
> > testing.  For 24x7 services, it seems like you pay a
> premium long term over
> > owning your own hardware, with advantage of no large
> up-front cost for
> > acquisition and access to easy elasticity to expand to
> meet demand, but with
> > a cost of reduced performance per node due to
> virtualization.
> > 
> > Best advice I can give is do some benchmarking to see
> how many nodes you
> > need to satisfy your processing requirements in EC2 vs
> on raw hardware and
> > try to comparatively price it out.
> > 
> > --gh
> > 
> > On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <weishung@gmail.com>
> > wrote:
> > 
> > > I am trying to estimate the cost of hosting own
> HBase
> > cluster vs using EC2.
> > > Could anyone give me some guidance?
> > > Cluster size ~ 6 to 8 nodes
> > > Usage ~ at least 12 hours/day with lot of
> read/write
> > operations. (I know I
> > > need to have more concrete usage number here)
> > >
> > > Thank you so much :)
> > >
> > 
> 
> 
>       
> 


      

Mime
View raw message