hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Strange performance behavior of SingleValColumnFilter
Date Sat, 22 Oct 2011 03:22:51 GMT
Was the following evaluation performed on 0.92 ?
Also, I assume you use ROWCOL bloom filter.
In TRUNK, Mikhail has put in lazy seek which I think should help


On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> We found that even with many columns, and even when the filter matches the
> first column, SKIP is still faster than NEXT_ROW.
> So either the reseek is extremely inefficient, or there is something else
> at play.
> It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next
> N KVs (maybe N=10 or 20 or even bigger) to see if we
> get to the next row, and only if we didn't reach the next row do the
> reseek.
> ________________________________
> From: lars hofhansl <lhofhansl@yahoo.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <
> lhofhansl@yahoo.com>
> Sent: Friday, October 21, 2011 4:34 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
> Maybe it even makes sense. When the scan is limited to one column and there
> is only one version, SKIP would skip to the next row.
> But 10x slower for NEXT_ROW seems extreme.
> ________________________________
> From: lars hofhansl <lhofhansl@yahoo.com>
> To: hbase-dev <dev@hbase.apache.org>
> Sent: Friday, October 21, 2011 3:49 PM
> Subject: Strange performance behavior of SingleValColumnFilter
> We have been doing some performance testing on HBase filters. One outcome
> was HBASE-4626 (which I fixed and committed yesterday night).
> Now we found a rather strange behavior with SingleColumnValueFilter. On our
> test cluster it is 10x slower than ValueFilter, even when we restrict the
> scan to just the one column we are filtering on and set filterIfMissing to
> true.
> We are not seeing that with HBase in local mode, which points to some
> additional activity on the FS, which in HDFS would be slow compared to a
> local FS.
> Indeed it turns out the problem goes away when we replace all NEXT_ROW with
> SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> better (on par with ValueFilter).
> We're using something pretty close to trunk for our tests.
> The tables are pretty wide, only one version of each cells (and freshly
> major compacted).
> I do not know this part of the code that well (yet) and was wondering if
> somebody could chime in. Maybe this is related to HFileV2?
> I do recall there was something done to optimize reseeks. Generally I would
> have expected NEXT_ROW to be a major performance improvement.
> Any ideas, comments, pointers?
> Thanks.
> -- Lars

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message