hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: HBase Performance question
Date Tue, 22 Apr 2008 16:14:46 GMT
> -----Original Message-----
> From: Karthik Pattabiraman [mailto:pkarthik@yahoo-inc.com]
> Sent: Tuesday, April 22, 2008 7:31 AM
> To: hbase-dev@hadoop.apache.org
> Subject: HBase Performance question
> Hi,
>     I am evaluating HBase for a serving system. The
> requirements are fairly simple. Each record comprises a key
> and a value (size ~4k).
>     I set up a small cluster consisting of two boxes and the
> number of records inserted into the table is close to 65K.
>    Now I ran a tomcat server on one of the boxes (where the
> master is running). The tomcat server establishes a
> connection to hbase at start and then on each request queries
> the hbase for the record.
>     The benchmarks were not good. I ran the benchmarks for 30
> min with 20 clients (talking to tomcat) and the average
> response time was 51 ms.
> When I increased the number of clients to 50, the average
> response time increased to 110 ms. To ensure that tomcat is
> not the bottleneck, I logged the time taken for each Hbase
> request and found that to be correlating the benchmarks (time
> increased when i increased the number of clients.).
>     Any idea as to why this would happen given that the
> number of records is not huge? (FYI: Both the boxes act as
> region servers, the box where tomcat runs, also runs hbase
> master and the dfs namenode)

If you are doing single record reads (either sequential or random)
HBase currently does not perform that well. (see
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation , this
page has not been updated recently - as you can see, the numbers
are improving but there is a long way to go). Scanning a row range
is by far the highest performing operation.

HBase is currently best suited for batch-oriented map/reduce type
operations. There are several installations that use it successfully
for this purpose.

The focus for HBase has and will be (until at least release 0.2.0
which will be released in about one month) reliability and robustness.
(i.e., make it work and then make it work fast)

The release that follows 0.2.0 will focus on performance issues. We
know of two areas where HBase spends most of its time:
- in RPC calls (but we have not broken it down to marshalling,
  unmarshalling or introspection used to make the RPC)
- in the Hadoop FileSystem abstraction. (even on local disk, it is
  not as fast as we'd like)

Hope that helps.

>     Could it be due to the way I have set up Hbase?
>     Any help would be appreciated.
> thanks
> karthik
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.2/1389 - Release
> Date: 4/21/2008 8:34 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.3/1391 - Release Date: 4/22/2008 8:15 AM

View raw message