hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Slow scanning for PrefixFilter on EncodedBlocks
Date Wed, 17 Oct 2012 16:41:06 GMT
Hi Zahoor,

I heavily use prefix filter. Every time i have to explicitly define the
startRow. So, that's the current behavior. However, initially this behavior
was confusing to me also.
I think that when a Prefix filter is defined then internally the
startRow=prefix can be set. User defined StartRow takes precedence over the
prefixFilter startRow. If the current prefixFilter can be modified in that
way then it will eradicate this confusion regarding performance of prefix
filter.

Thanks,
Anil Gupta

On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <jmozah@gmail.com> wrote:

> First i upgraded my cluster to 94.2.. even then the problem persisted..
> Then i moved to using startRow instead of prefix filter..
>
>
> ,/zahoor
>
> On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <jmozah@gmail.com>
> wrote:
>
> > Sorry for the delay.
> >
> > It looks like the problem is because of PrefixFilter...
> > I assumed that i does a seek...
> >
> > If i use startRow instead.. it works fine.. But is it the correct
> approach?
> >
> > ./zahoor
> >
> >
> > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <lhofhansl@yahoo.com
> >wrote:
> >
> >> I reopened HBASE-6577
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: lars hofhansl <lhofhansl@yahoo.com>
> >> To: "user@hbase.apache.org" <user@hbase.apache.org>; lars hofhansl <
> >> lhofhansl@yahoo.com>
> >> Cc:
> >> Sent: Tuesday, October 16, 2012 2:39 PM
> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >>
> >> Looks like this is exactly the scenario I was trying to optimize with
> >> HBASE-6577. Hmm...
> >> ________________________________
> >> From: lars hofhansl <lhofhansl@yahoo.com>
> >> To: "user@hbase.apache.org" <user@hbase.apache.org>
> >> Sent: Tuesday, October 16, 2012 12:21 AM
> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >>
> >> PrefixFilter does not do any seeking by itself, so I doubt this is
> >> related to HBASE-6757.
> >> Does this only happen with FAST_DIFF compression?
> >>
> >>
> >> If you can create an isolated test program (that sets up the scenario
> and
> >> then runs a scan with the filter such that it is very slow), I'm happy
> to
> >> take a look.
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: J Mohamed Zahoor <jmozah@gmail.com>
> >> To: "user@hbase.apache.org" <user@hbase.apache.org>
> >> Cc:
> >> Sent: Monday, October 15, 2012 10:27 AM
> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >>
> >> Is this related to HBASE-6757 ?
> >> I use a filter list with
> >>   - prefix filter
> >>   - filter list of column filters
> >>
> >> /zahoor
> >>
> >> On Monday, October 15, 2012, J Mohamed Zahoor wrote:
> >>
> >> > Hi
> >> >
> >> > My scanner performance is very slow when using a Prefix filter on a
> >> > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk).
> >> > I am using 94.1 hbase.
> >> >
> >> > jstack shows that much time is spent on seeking the row.
> >> > Even if i give a exact row key match in the prefix filter it takes
> about
> >> > two minutes to return a single row.
> >> > Running this multiple times also seems to be redirecting things to
> disk
> >> > (loadBlock).
> >> >
> >> >
> >> > at
> >> >
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
> >> > at
> >> >
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
> >> >  at
> >> >
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
> >> > at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
> >> >  at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
> >> > at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
> >> >  at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
> >> > - locked <0x000000059584fab8> (a
> >> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >> >  at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
> >> > - locked <0x000000059584fab8> (a
> >> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >> >  at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
> >> > at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455)
> >> >  at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
> >> > - locked <0x000000059589bb30> (a
> >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
> >> >  at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
> >> >
> >> > If is set the start and end row as same row in scan ... it come in
> very
> >> > quick.
> >> >
> >> > Saw this link
> >> >
> >>
> http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug
> >> > But it looks like things are fine in 94.1.
> >> >
> >> > Any pointers on why this is slow?
> >> >
> >> >
> >> > Note: the row has not many columns(5 and less than a kb) and lots of
> >> > versions (1500+)
> >> >
> >> > ./zahoor
> >> >
> >> >
> >> >
> >>
> >>
> >
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message