Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C18779011 for ; Tue, 27 Mar 2012 06:03:58 +0000 (UTC) Received: (qmail 77673 invoked by uid 500); 27 Mar 2012 06:03:58 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 77589 invoked by uid 500); 27 Mar 2012 06:03:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 77202 invoked by uid 99); 27 Mar 2012 06:03:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2012 06:03:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2012 06:03:55 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 632F63487E7 for ; Tue, 27 Mar 2012 06:03:34 +0000 (UTC) Date: Tue, 27 Mar 2012 06:03:34 +0000 (UTC) From: "stack (Updated) (JIRA)" To: issues@hbase.apache.org Message-ID: <1198015554.21931.1332828214407.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <935088894.4873.1316747487232.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-4465) Lazy-seek optimization for StoreFile scanners MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4465: ------------------------- Release Note: Check the most recent file first before seeking all other files in a Store. Hadoop Flags: Reviewed Thanks Mikhail. > Lazy-seek optimization for StoreFile scanners > --------------------------------------------- > > Key: HBASE-4465 > URL: https://issues.apache.org/jira/browse/HBASE-4465 > Project: HBase > Issue Type: Improvement > Reporter: Mikhail Bautin > Assignee: Mikhail Bautin > Labels: optimization, seek > Fix For: 0.89.20100924, 0.94.0 > > Attachments: HBASE-4465_Lazy-seek_optimization_for_St-20111005121052-b2ea8753.patch > > > Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). > This is joint work with Liyin Tang -- huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira