Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5EB710B8C for ; Mon, 21 Oct 2013 11:12:48 +0000 (UTC) Received: (qmail 55819 invoked by uid 500); 21 Oct 2013 11:12:47 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 55556 invoked by uid 500); 21 Oct 2013 11:12:44 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 55533 invoked by uid 99); 21 Oct 2013 11:12:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Oct 2013 11:12:42 +0000 Date: Mon, 21 Oct 2013 11:12:42 +0000 (UTC) From: "Chao Shi (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-9811) ColumnPaginationFilter is slow when offset is large MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Chao Shi created HBASE-9811: ------------------------------- Summary: ColumnPaginationFilter is slow when offset is large Key: HBASE-9811 URL: https://issues.apache.org/jira/browse/HBASE-9811 Project: HBase Issue Type: Bug Reporter: Chao Shi Hi there, we are trying to migrate a app from MySQL to HBase. One kind of the queries is pagination with large offset and small limit. We don't have too many such queries and so both MySQL and HBase should survive. (MySQL has no index for offset either.) When comparing the performance on both systems, we found something interest: write ~1M values in a single row, and query with offset = 1M. So all values should be scanned on RS side. When running the query on MySQL, the first query is pretty slow (more than 1 second) and then repeat the same query, it will become very low latency. HBase on the other hand, repeating the query does not help much (~1s forever). I can confirm that all data are in block cache and all the time is spent on in-memory data processing. (We have flushed data to disk.) I found "reseek" is the hot spot. It is caused by ColumnPaginationFilter returning NEXT_COL. If I replace this line by returning SKIP (which causes to call next rather than reseek), the latency is reduced to ~100ms. So I think there must be some room for optimization. -- This message was sent by Atlassian JIRA (v6.1#6144)