hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Strange performance behavior of SingleValColumnFilter
Date Sat, 22 Oct 2011 04:22:08 GMT
No it was a trunk build. The local tests I did with a build from today.
Our test cluster is a 1 or 2 weeks old.

It seems it just much cheaper to scan through block that we already have or even scanning
into the next block than to reseek.



----- Original Message -----
From: Ted Yu <yuzhihong@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
Cc: 
Sent: Friday, October 21, 2011 8:22 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Was the following evaluation performed on 0.92 ?
Also, I assume you use ROWCOL bloom filter.
In TRUNK, Mikhail has put in lazy seek which I think should help
performance.

Cheers

On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> We found that even with many columns, and even when the filter matches the
> first column, SKIP is still faster than NEXT_ROW.
> So either the reseek is extremely inefficient, or there is something else
> at play.
>
> It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next
> N KVs (maybe N=10 or 20 or even bigger) to see if we
> get to the next row, and only if we didn't reach the next row do the
> reseek.
>
> ________________________________
> From: lars hofhansl <lhofhansl@yahoo.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <
> lhofhansl@yahoo.com>
> Sent: Friday, October 21, 2011 4:34 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Maybe it even makes sense. When the scan is limited to one column and there
> is only one version, SKIP would skip to the next row.
> But 10x slower for NEXT_ROW seems extreme.
>
>
>
> ________________________________
> From: lars hofhansl <lhofhansl@yahoo.com>
> To: hbase-dev <dev@hbase.apache.org>
> Sent: Friday, October 21, 2011 3:49 PM
> Subject: Strange performance behavior of SingleValColumnFilter
>
> We have been doing some performance testing on HBase filters. One outcome
> was HBASE-4626 (which I fixed and committed yesterday night).
>
> Now we found a rather strange behavior with SingleColumnValueFilter. On our
> test cluster it is 10x slower than ValueFilter, even when we restrict the
> scan to just the one column we are filtering on and set filterIfMissing to
> true.
> We are not seeing that with HBase in local mode, which points to some
> additional activity on the FS, which in HDFS would be slow compared to a
> local FS.
>
>
> Indeed it turns out the problem goes away when we replace all NEXT_ROW with
> SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> better (on par with ValueFilter).
>
>
> We're using something pretty close to trunk for our tests.
> The tables are pretty wide, only one version of each cells (and freshly
> major compacted).
>
>
> I do not know this part of the code that well (yet) and was wondering if
> somebody could chime in. Maybe this is related to HFileV2?
>
> I do recall there was something done to optimize reseeks. Generally I would
> have expected NEXT_ROW to be a major performance improvement.
>
> Any ideas, comments, pointers?
>
> Thanks.
>
> -- Lars
>


Mime
View raw message