hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: Performance between HBaseClient scan and HFileReaderV2
Date Thu, 02 Jan 2014 15:56:59 GMT
Hi Tom,

Good point. Note that I also ran the HBaseClient performance test several
times (as you can see from the chart). The caching should also benefit the
second time I ran the HBaseClient performance test not just benefitting the
HFileReaderV2 test.

I still don't understand what makes the HBaseClient performs so poorly in
comparison to access directly HDFS. I can understand maybe a factor of 2
(even that it is too much) but a factor of 8 is quite unreasonable.

Any hint?

Jerry



On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <tom.w.hood@gmail.com> wrote:

> I'm also new to HBase and am not familiar with HFileReaderV2.  However, in
> your description, you didn't mention anything about clearing the linux OS
> cache between tests.  That might be why you're seeing the big difference if
> you ran the HBaseClient test first, it may have warmed the OS cache and
> then HFileReaderV2 benefited from it.  Just a guess...
>
> -- Tom
>
>
>
> On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <chilinglam@gmail.com> wrote:
>
> > Hello HBase users,
> >
> > I just ran a very simple performance test and would like to see if what I
> > experienced make sense.
> >
> > The experiment is as follows:
> > - I filled a hbase region with 700MB data (each row has roughly 45
> columns
> > and the size is 20KB for the entire row)
> > - I configured the region to hold 4GB (therefore no split occurs)
> > - I ran compactions after the data is loaded and make sure that there is
> > only 1 region in the table under test.
> > - No other table exists in the hbase cluster because this is a DEV
> > environment
> > - I'm using HBase 0.92.1
> >
> > The test is very basic. I use HBaseClient to scan the entire region to
> > retrieve all rows and all columns in the table, just iterating all
> KeyValue
> > pairs until it is done. It took about 1 minute 22 sec to complete. (Note
> > that I disable block cache and uses caching size about 10000).
> >
> > I ran another test using HFileReaderV2 and scan the entire region to
> > retrieve all rows and all columns, just iterating all keyValue pairs
> until
> > it is done. It took 11 sec.
> >
> > The performance difference is dramatic (almost 8 times faster using
> > HFileReaderV2).
> >
> > I want to know why the difference is so big or I didn't configure HBase
> > properly. From this experiment, HDFS can deliver the data efficiently so
> it
> > is not the bottleneck.
> >
> > Any help is appreciated!
> >
> > Jerry
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message