hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Sharma <adityadsha...@gmail.com>
Subject Re: High variance in results for hbase benchmarking
Date Sun, 06 Mar 2011 18:00:29 GMT
Mark, Gary, Ted,

Thanks for your responses. I will keep the EC2 issues and other things in
mind when I get a chance to redo the benchmarking. BTW is there any
recommendation for an on demand computing  provider for benchmarking
purpose?

@Gary,
To answer your questions , I am using the default configuration files (with
the hostnames changed of course) with Hadoop 0.20.2 and HBase 0.90.1 and the
default replication of 3 for HDFS. I am not using EBS because I was
concerned about the network latency between the EC2 host and EBS affecting
benchmarking.


@Ted,
No, I am actually trying to emulate existing application behaviour, so the
insert/upsert code is single threaded. However I had used the same
configuration for other datastores like MongoDB/Cassandra etc, and did not
see any marked drop in performance. However this could be because of the EC2
hardware variations.

Aditya


On Fri, Mar 4, 2011 at 1:58 PM, Andrew Purtell <apurtell@apache.org> wrote:

> > > Since we are using EC2 Large instances, it seems
> > > unlikely that network or some other virtualization
> > > related resources crunch are affecting our
> > > performance measurement.
>
> Your assumptions are wrong. It seems only c1.xlarge and m2.4xlarge may be
> assigned dedicated hardware. Reference:
> http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/
Their shared disk storage (instance-store) would still be impacted by
> neighbors.
>
> I think the only way you will approach consistent results is if you use the
> cluster compute instances (cc1.4xlarge). These are a completely different
> architecture, HVM instead of PVM, dedicated 10GigE network, dedicated
> physical hosts, etc.
>
> With other instance types I see large variance from day to day even hour to
> hour. In short, EC2 is useless for performance benchmarking. It's very handy
> for a lot of other things though, like functional or smoke testing.
>
> For additional information see:
> http://www.comp.nus.edu.sg/~vldb2010/proceedings/files/papers/E02.pdf .
>
>   - Andy
>
>
> --- On Thu, 3/3/11, Gary Helmling <ghelmling@gmail.com> wrote:
>
> > From: Gary Helmling <ghelmling@gmail.com>
> > Subject: Re: High variance in results for hbase benchmarking
> > To: user@hbase.apache.org
> > Cc: "Aditya Sharma" <adityadsharma@gmail.com>
> > Date: Thursday, March 3, 2011, 11:37 PM
> > On Thu, Mar 3, 2011 at 10:19 PM,
> > Aditya Sharma <adityadsharma@gmail.com>wrote:
> >
> > >
> > > Since we are using EC2 Large instances, it seems
> > > unlikely that network or some other virtualization
> > > related resources crunch are affecting our
> > > performance measurement.
> > >
> > >
> > You are guaranteed to see large variance in results when
> > benchmarking on EC2.  Welcome to the oversubscribed public
> > cloud!  You can run the same test twice with the same
> > instances and still see massive differences.  You should
> > expect at least 25% variance between test runs (in practice
> > I've seen as much as 100% variance myself).
> >
> > Two nodes is a very small cluster to be benchmarking on.
> > The minimum cluster size is typically recommended as something
> > like 1 master node (NN, JT and HBase Master) + 3 slaves (DN,
> > TT and Region Server).  But HBase really works best when you
> > start to approach 10 slaves or
> > more.
> [...]
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message