hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: EC2 Elastic MapReduce HBase install recommendations
Date Thu, 09 May 2013 04:12:05 GMT
To add to what Andy said - the key to getting HBase running well in AWS is:

1. Choose the right instance types. I usually recommend the HPC
instances or now the high storage density instances. Those will give
you the best performance.

2. Use the latest Amzn Linux AMIs and the latest HBase and HDFS
versions that work with each other.

3. Tune HBase for your workload. This you have to do anyway but HBase
on AWS is less forgiving as compared on on premise.

I've personally tested upto 10k req/sec/server writing 1K payloads on
HBase 0.92 (that's old!) on HPC instances.

On May 8, 2013, at 9:05 PM, Andrew Purtell <apurtell@apache.org> wrote:

> M7 is not Apache HBase, or any HBase. It is a proprietary NoSQL datastore
> with (I gather) an Apache HBase compatible Java API.
> As for running HBase on EC2, we recently discussed some particulars, see
> the latter part of this thread: http://search-hadoop.com/m/rI1HpK90gu where
> I hijack it. I wouldn't recommend launching HBase as part of an EMR flow
> unless you want to use it only for temporary random access storage, and in
> which case use m2.2xlarge/m2.4xlarge instance types. Otherwise, set up a
> dedicated HBase backed storage service on high I/O instance types. The
> fundamental issue is IO performance on the EC2 platform is fair to poor.
> I have also noticed a large difference in baseline block device latency if
> using an old Amazon Linux AMI (< 2013) or the latest AMIs from this year.
> Use the new ones, they cut the latency long tail in half. There were some
> significant kernel level improvements I gather.
> On Wed, May 8, 2013 at 10:42 AM, Marcos Luis Ortiz Valmaseda <
> marcosluis2186@gmail.com> wrote:
>> I think that you when you are talking about RMap, you are referring to
>> MapR´s distribution.
>> I think that MapR´s team released a very good version of its Hadoop
>> distribution focused on HBase called M7. You can see its overview here:
>> http://www.mapr.com/products/mapr-editions/m7-edition
>> But this release was under beta testing, and I see that it´s not included
>> in the Amazon Marketplace yet:
>> https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5
>> 2013/5/7 Pal Konyves <paul.konyves@gmail.com>
>>> Hi,
>>> Has anyone got some recommendations about running HBase on EC2? I am
>>> testing it, and so far I am very disappointed with it. I did not change
>>> anything about the default 'Amazon distribution' installation. It has one
>>> MasterNode and two slave nodes, and write performance is around 2500
>> small
>>> rows per sec at most, but I expected it to be way  better. Oh, and this
>> is
>>> with batch put operations with autocommit turned off, where each batch
>>> containes about 500-1000 rows... When I do it with autocommit, it does
>> not
>>> even reach the 1000 rows per sec.
>>> Every nodes were m1.Large ones.
>>> Any experiences, suggestions? Is it worth to try the RMap distribution
>>> instead of the amazon one?
>>> Thanks,
>>> Pal
>> --
>> Marcos Ortiz Valmaseda
>> Product Manager at PDVSA
>> http://about.me/marcosortiz
> --
> Best regards,
>   - Andy
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

View raw message