hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Strange performance behavior of SingleValColumnFilter
Date Sun, 23 Oct 2011 00:16:03 GMT
Thanks N.

I do not think the time is lost in the memstore. We're working with fully compacted
tables and do no updates during the read testing.

We'll be spending more time to track this down on Monday.


-- Lars

________________________________
From: N Keywal <nkeywal@gmail.com>
To: dev@hbase.apache.org
Sent: Saturday, October 22, 2011 2:53 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Hi,

I made a change recently on this. It was to fix a consistency bug rather
than improve the performances, but on my test the performances were actually
improved as well. It was for MemStore only. Is the time lost on the memstore
or in the persisted related part?

Cheers,

N.

On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> No it was a trunk build. The local tests I did with a build from today.
> Our test cluster is a 1 or 2 weeks old.
>
> It seems it just much cheaper to scan through block that we already have or
> even scanning into the next block than to reseek.
>
>
>
> ----- Original Message -----
> From: Ted Yu <yuzhihong@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
> Cc:
> Sent: Friday, October 21, 2011 8:22 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Was the following evaluation performed on 0.92 ?
> Also, I assume you use ROWCOL bloom filter.
> In TRUNK, Mikhail has put in lazy seek which I think should help
> performance.
>
> Cheers
>
> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lhofhansl@yahoo.com>
> wrote:
>
> > We found that even with many columns, and even when the filter matches
> the
> > first column, SKIP is still faster than NEXT_ROW.
> > So either the reseek is extremely inefficient, or there is something else
> > at play.
> >
> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the
> next
> > N KVs (maybe N=10 or 20 or even bigger) to see if we
> > get to the next row, and only if we didn't reach the next row do the
> > reseek.
> >
> > ________________________________
> > From: lars hofhansl <lhofhansl@yahoo.com>
> > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Sent: Friday, October 21, 2011 4:34 PM
> > Subject: Re: Strange performance behavior of SingleValColumnFilter
> >
> > Maybe it even makes sense. When the scan is limited to one column and
> there
> > is only one version, SKIP would skip to the next row.
> > But 10x slower for NEXT_ROW seems extreme.
> >
> >
> >
> > ________________________________
> > From: lars hofhansl <lhofhansl@yahoo.com>
> > To: hbase-dev <dev@hbase.apache.org>
> > Sent: Friday, October 21, 2011 3:49 PM
> > Subject: Strange performance behavior of SingleValColumnFilter
> >
> > We have been doing some performance testing on HBase filters. One outcome
> > was HBASE-4626 (which I fixed and committed yesterday night).
> >
> > Now we found a rather strange behavior with SingleColumnValueFilter. On
> our
> > test cluster it is 10x slower than ValueFilter, even when we restrict the
> > scan to just the one column we are filtering on and set filterIfMissing
> to
> > true.
> > We are not seeing that with HBase in local mode, which points to some
> > additional activity on the FS, which in HDFS would be slow compared to a
> > local FS.
> >
> >
> > Indeed it turns out the problem goes away when we replace all NEXT_ROW
> with
> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> > better (on par with ValueFilter).
> >
> >
> > We're using something pretty close to trunk for our tests.
> > The tables are pretty wide, only one version of each cells (and freshly
> > major compacted).
> >
> >
> > I do not know this part of the code that well (yet) and was wondering if
> > somebody could chime in. Maybe this is related to HFileV2?
> >
> > I do recall there was something done to optimize reseeks. Generally I
> would
> > have expected NEXT_ROW to be a major performance improvement.
> >
> > Any ideas, comments, pointers?
> >
> > Thanks.
> >
> > -- Lars
> >
>
>

Mime
View raw message