hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Strange performance behavior of SingleValColumnFilter
Date Fri, 21 Oct 2011 23:34:43 GMT
Maybe it even makes sense. When the scan is limited to one column and there is only one version,
SKIP would skip to the next row.
But 10x slower for NEXT_ROW seems extreme.

From: lars hofhansl <lhofhansl@yahoo.com>
To: hbase-dev <dev@hbase.apache.org>
Sent: Friday, October 21, 2011 3:49 PM
Subject: Strange performance behavior of SingleValColumnFilter

We have been doing some performance testing on HBase filters. One outcome was HBASE-4626 (which
I fixed and committed yesterday night).

Now we found a rather strange behavior with SingleColumnValueFilter. On our test cluster it
is 10x slower than ValueFilter, even when we restrict the scan to just the one column we are
filtering on and set filterIfMissing to true.
We are not seeing that with HBase in local mode, which points to some additional activity
on the FS, which in HDFS would be slow compared to a local FS.

Indeed it turns out the problem goes away when we replace all NEXT_ROW with SKIP in SingleColumnValueFilter.filterKeyValue
the performance is *much* better (on par with ValueFilter).

We're using something pretty close to trunk for our tests.
The tables are pretty wide, only one version of each cells (and freshly major compacted).

I do not know this part of the code that well (yet) and was wondering if somebody could chime
in. Maybe this is related to HFileV2?

I do recall there was something done to optimize reseeks. Generally I would have expected
NEXT_ROW to be a major performance improvement.

Any ideas, comments, pointers?


-- Lars
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message