Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B754E9F9C for ; Thu, 29 Sep 2011 18:43:08 +0000 (UTC) Received: (qmail 64029 invoked by uid 500); 29 Sep 2011 18:43:08 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 63992 invoked by uid 500); 29 Sep 2011 18:43:08 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 63983 invoked by uid 99); 29 Sep 2011 18:43:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Sep 2011 18:43:08 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Sep 2011 18:43:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B6C462A1C4B for ; Thu, 29 Sep 2011 18:42:46 +0000 (UTC) Date: Thu, 29 Sep 2011 18:42:46 +0000 (UTC) From: "jiraposter@reviews.apache.org (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <1736554769.8598.1317321766750.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117524#comment-13117524 ] jiraposter@reviews.apache.org commented on HBASE-2794: ------------------------------------------------------ bq. On 2011-09-28 17:42:46, Ted Yu wrote: bq. > This is an important feature. bq. > bq. > Since the boolean parameter, forward, correlates so closely with reseek, can we give it a better name ? bq. > I was thinking about either reseek or forwardOnly. bq. bq. Mikhail Bautin wrote: bq. We have a few diffs in the pipeline that depend on this one. Can we rename the boolean flag after we commit those diffs? I am fine with the current name of forward. - Ted ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2137 ----------------------------------------------------------- On 2011-09-28 16:03:52, Mikhail Bautin wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2084/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-09-28 16:03:52) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. ------- bq. bq. Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. bq. bq. bq. This addresses bug HBASE-2794. bq. https://issues.apache.org/jira/browse/HBASE-2794 bq. bq. bq. Diffs bq. ----- bq. bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e bq. src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de bq. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef bq. src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c bq. src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 bq. src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 bq. src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e bq. src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2084/diff bq. bq. bq. Testing bq. ------- bq. bq. Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. bq. bq. bq. Thanks, bq. bq. Mikhail bq. bq. > ROWCOL bloom filter not used if multiple columns within same family are requested in a Get > ------------------------------------------------------------------------------------------ > > Key: HBASE-2794 > URL: https://issues.apache.org/jira/browse/HBASE-2794 > Project: HBase > Issue Type: Improvement > Components: performance > Reporter: Kannan Muthukkaruppan > Fix For: 0.92.0 > > > Noticed the following snippet in StoreFile.java:Scanner:shouldSeek(): > {code} > switch(bloomFilterType) { > case ROW: > key = row; > break; > case ROWCOL: > if (columns.size() == 1) { > byte[] col = columns.first(); > key = Bytes.add(row, col); > break; > } > //$FALL-THROUGH$ > default: > return true; > } > {code} > If columns.size > 1, then we currently don't take advantage of the bloom filter. We should optimize this to check bloom for each of columns and if none of the columns are present in the bloom avoid opening the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira