Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C64C9B3C for ; Wed, 28 Dec 2011 23:59:52 +0000 (UTC) Received: (qmail 3982 invoked by uid 500); 28 Dec 2011 23:59:52 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 3921 invoked by uid 500); 28 Dec 2011 23:59:52 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 3913 invoked by uid 99); 28 Dec 2011 23:59:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Dec 2011 23:59:52 +0000 X-ASF-Spam-Status: No, hits=-2001.3 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Dec 2011 23:59:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9CF2A12E908 for ; Wed, 28 Dec 2011 23:59:30 +0000 (UTC) Date: Wed, 28 Dec 2011 23:59:30 +0000 (UTC) From: "Liyin Tang (Updated) (JIRA)" To: issues@hbase.apache.org Message-ID: <2058162949.49916.1325116770644.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <844367851.13277.1323905611161.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-5032) Add other DELETE type information into the delete bloom filter to optimize the time range query MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5032: ------------------------------ Description: To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. >From the implementation prospective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. was: To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. >From the implementation prospective, we have already have a delete family bloom filter which contains all the Summary: Add other DELETE type information into the delete bloom filter to optimize the time range query (was: Add other DELETE or DELETE into the delete bloom filter) > Add other DELETE type information into the delete bloom filter to optimize the time range query > ----------------------------------------------------------------------------------------------- > > Key: HBASE-5032 > URL: https://issues.apache.org/jira/browse/HBASE-5032 > Project: HBase > Issue Type: Improvement > Reporter: Liyin Tang > Assignee: Liyin Tang > > To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) > So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. > From the implementation prospective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira