hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-9811) ColumnPaginationFilter is slow when offset is large
Date Mon, 21 Oct 2013 11:12:41 GMT
Chao Shi created HBASE-9811:

             Summary: ColumnPaginationFilter is slow when offset is large
                 Key: HBASE-9811
                 URL: https://issues.apache.org/jira/browse/HBASE-9811
             Project: HBase
          Issue Type: Bug
            Reporter: Chao Shi

Hi there, we are trying to migrate a app from MySQL to HBase. One kind of the queries is pagination
with large offset and small limit. We don't have too many such queries and so both MySQL and
HBase should survive. (MySQL has no index for offset either.)

When comparing the performance on both systems, we found something interest: write ~1M values
in a single row, and query with offset = 1M. So all values should be scanned on RS side.

When running the query on MySQL, the first query is pretty slow (more than 1 second) and then
repeat the same query, it will become very low latency.

HBase on the other hand, repeating the query does not help much (~1s forever). I can confirm
that all data are in block cache and all the time is spent on in-memory data processing. (We
have flushed data to disk.)

I found "reseek" is the hot spot. It is caused by ColumnPaginationFilter returning NEXT_COL.
If I replace this line by returning SKIP (which causes to call next rather than reseek), the
latency is reduced to ~100ms.

So I think there must be some room for optimization.

This message was sent by Atlassian JIRA

View raw message