hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: Performance between HBaseClient scan and HFileReaderV2
Date Thu, 02 Jan 2014 17:18:01 GMT
Hello St.Ack,

I would like to switch to 0.94 but we are using 0.92.1 and we will not
change until the end of 2014. I can change the "client" of HBase (e.g.
AsyncHBase) if this is the bottleneck. If the problem is server side (e.g.
regionserver), are there anything I can do to improve the performance?

Best Regards,

Jerry


On Thu, Jan 2, 2014 at 11:23 AM, Stack <stack@duboce.net> wrote:

> On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <chilinglam@gmail.com> wrote:
>
> > Hello HBase users,
> >
> > I just ran a very simple performance test and would like to see if what I
> > experienced make sense.
> >
> > The experiment is as follows:
> > - I filled a hbase region with 700MB data (each row has roughly 45
> columns
> > and the size is 20KB for the entire row)
> > - I configured the region to hold 4GB (therefore no split occurs)
> > - I ran compactions after the data is loaded and make sure that there is
> > only 1 region in the table under test.
> > - No other table exists in the hbase cluster because this is a DEV
> > environment
> > - I'm using HBase 0.92.1
> >
> >
> Can you use a 0.94?  It has had some scanner improvements.
>
> Thanks,
> St.Ack
>
>
>
> > The test is very basic. I use HBaseClient to scan the entire region to
> > retrieve all rows and all columns in the table, just iterating all
> KeyValue
> > pairs until it is done. It took about 1 minute 22 sec to complete. (Note
> > that I disable block cache and uses caching size about 10000).
> >
> > I ran another test using HFileReaderV2 and scan the entire region to
> > retrieve all rows and all columns, just iterating all keyValue pairs
> until
> > it is done. It took 11 sec.
> >
> > The performance difference is dramatic (almost 8 times faster using
> > HFileReaderV2).
> >
> > I want to know why the difference is so big or I didn't configure HBase
> > properly. From this experiment, HDFS can deliver the data efficiently so
> it
> > is not the bottleneck.
> >
> > Any help is appreciated!
> >
> > Jerry
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message