hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: Performance between HBaseClient scan and HFileReaderV2
Date Thu, 02 Jan 2014 23:53:16 GMT
Hello Lars,

Yes, I used setCaching for getting more KeyValues in each RPC call. Also
yes, when I used HFileReaderV2 I still reading from HDFS. Short circuiting
is enabled but I don't know how to ensure it has been used (Is there log
that can tell me if it has been used?).

I did made sure the HBaseClient runs on the same regionserver that holds
the data.

I just tried asynchbase (as I'm running out of ideas, I started to try
everything), it takes 60 seconds to scan through the data (20 seconds less
than using HBaseClient).

Best Regards,

Jerry

On Thu, Jan 2, 2014 at 4:44 PM, lars hofhansl <larsh@apache.org> wrote:

> From the below I gather you set scanner caching (Scan.setCaching(...))?
> When you use HFileReaderV2, you're still reading from HDFS, right? Are you
> using short circuit reading (avoiding network IO)?
>
> In the HBaseClient client you pipe all the data through the network again.
> Is the HBaseClient located on a different machine?
>
> I would use a profiler (just use jVisualVM, which ships with the JDK and
> use the "sampling" profiler) to see where the time is spent.
>
> Lastly, to echo what other folks have said, 0.92 is pretty old at this
> point and I personally added a lot of performance improvements to HBase
> during the 0.94 timeframe and other's have as well.
> If you could test the same with 0.94, I'd be very interested in the
> numbers.
>
> -- Lars
>
>
>
> ________________________________
>  From: Jerry Lam <chilinglam@gmail.com>
> To: user <user@hbase.apache.org>
> Sent: Thursday, January 2, 2014 1:32 PM
> Subject: Re: Performance between HBaseClient scan and HFileReaderV2
>
>
> Hello Vladimir,
>
> In my use case, I guarantee that a major compaction is executed before any
> scan happens because the system we build is a read only system. There will
> have no deleted cells. Additionally, I only need to read from a single
> column family and therefore I don't need to access multiple HFiles.
>
> Filter conditions are nice to have because if I can read HFile 8x faster
> than using HBaseClient, I can do the filter on the client side and still
> perform faster than using HBaseClient.
>
> Thank you for your input!
>
> Jerry
>
>
>
>
> On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov
> <vrodionov@carrieriq.com>wrote:
>
> > HBase scanner MUST guarantee correct order of KeyValues (coming from
> > different HFile's),
> > filter condition+ filter condition on included column families and
> > qualifiers, time range, max versions and correctly process deleted cells.
> > Direct HFileReader does nothing from the above list.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Jerry Lam [chilinglam@gmail.com]
> > Sent: Thursday, January 02, 2014 7:56 AM
> > To: user
> > Subject: Re: Performance between HBaseClient scan and HFileReaderV2
> >
> > Hi Tom,
> >
> > Good point. Note that I also ran the HBaseClient performance test several
> > times (as you can see from the chart). The caching should also benefit
> the
> > second time I ran the HBaseClient performance test not just benefitting
> the
> > HFileReaderV2 test.
> >
> > I still don't understand what makes the HBaseClient performs so poorly in
> > comparison to access directly HDFS. I can understand maybe a factor of 2
> > (even that it is too much) but a factor of 8 is quite unreasonable.
> >
> > Any hint?
> >
> > Jerry
> >
> >
> >
> > On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <tom.w.hood@gmail.com> wrote:
> >
> > > I'm also new to HBase and am not familiar with HFileReaderV2.  However,
> > in
> > > your description, you didn't mention anything about clearing the linux
> OS
> > > cache between tests.  That might be why you're seeing the big
> difference
> > if
> > > you ran the HBaseClient test first, it may have warmed the OS cache and
> > > then HFileReaderV2 benefited from it.  Just a guess...
> > >
> > > -- Tom
> > >
> > >
> > >
> > > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <chilinglam@gmail.com>
> > wrote:
> > >
> > > > Hello HBase users,
> > > >
> > > > I just ran a very simple performance test and would like to see if
> > what I
> > > > experienced make sense.
> > > >
> > > > The experiment is as follows:
> > > > - I filled a hbase region with 700MB data (each row has roughly 45
> > > columns
> > > > and the size is 20KB for the entire row)
> > > > - I configured the region to hold 4GB (therefore no split occurs)
> > > > - I ran compactions after the data is loaded and make sure that there
> > is
> > > > only 1 region in the table under test.
> > > > - No other table exists in the hbase cluster because this is a DEV
> > > > environment
> > > > - I'm using HBase 0.92.1
> > > >
> > > > The test is very basic. I use HBaseClient to scan the entire region
> to
> > > > retrieve all rows and all columns in the table, just iterating all
> > > KeyValue
> > > > pairs until it is done. It took about 1 minute 22 sec to complete.
> > (Note
> > > > that I disable block cache and uses caching size about 10000).
> > > >
> > > > I ran another test using HFileReaderV2 and scan the entire region to
> > > > retrieve all rows and all columns, just iterating all keyValue pairs
> > > until
> > > > it is done. It took 11 sec.
> > > >
> > > > The performance difference is dramatic (almost 8 times faster using
> > > > HFileReaderV2).
> > > >
> > > > I want to know why the difference is so big or I didn't configure
> HBase
> > > > properly. From this experiment, HDFS can deliver the data efficiently
> > so
> > > it
> > > > is not the bottleneck.
> > > >
> > > > Any help is appreciated!
> > > >
> > > > Jerry
> > > >
> > > >
> > >
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message