Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 17996 invoked from network); 29 Sep 2010 20:37:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Sep 2010 20:37:46 -0000 Received: (qmail 60688 invoked by uid 500); 29 Sep 2010 20:37:46 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 60508 invoked by uid 500); 29 Sep 2010 20:37:45 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 60500 invoked by uid 99); 29 Sep 2010 20:37:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Sep 2010 20:37:45 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=FH_HELO_EQ_D_D_D_D,FREEMAIL_FROM,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 184.73.217.71 is neither permitted nor denied by domain of ryanobjc@gmail.com) Received: from [184.73.217.71] (HELO ip-10-202-7-187.ec2.internal) (184.73.217.71) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Sep 2010 20:37:39 +0000 Received: from ip-10-202-7-187.ec2.internal (localhost [127.0.0.1]) by ip-10-202-7-187.ec2.internal (Postfix) with ESMTP id D48358A1FE; Wed, 29 Sep 2010 20:37:18 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: Review Request: Reseeking directly to required columns From: "Ryan Rawson" To: "Karthik Ranganathan" , "Jonathan Gray" , "Kannan Muthukkaruppan" , stack@duboce.net Date: Wed, 29 Sep 2010 20:37:18 -0000 Message-ID: <20100929203718.12040.11323@ip-10-202-7-187.ec2.internal> Cc: dev@hbase.apache.org, "Ryan Rawson" , jiraposter@review.hbase.org, "Pranav Khaitan" In-Reply-To: <20100929015219.5746.48974@ip-10-202-7-187.ec2.internal> References: <20100929015219.5746.48974@ip-10-202-7-187.ec2.internal> On 2010-09-28 18:52:19, Pranav Khaitan wrote: > > Ryan: = > > = > > Additionally, as part of the commit, you added the optimization for SEE= K_NEXT_ROW. Had a question on the getKeyForNextRow() function: > > = > > + > > + public KeyValue getKeyForNextRow(KeyValue kv) { > > + return KeyValue.createLastOnRow( > > + kv.getBuffer(), kv.getRowOffset(), kv.getRowLength(), > > + null, 0, 0, > > + null, 0, 0); > > + } > > = > > Is a KeyValue constructured with null column family & qualifier is inde= ed larger than all KeyValues in that row? Just want to make sure it doesn't= reseek back to the very top of the current row :). [Note: I haven't spent = time trying to confirm this; but was concerned that the null column family = & qualifier might end up causing this KV to be smaller than the other KVs f= or the row. Will try and test it out to confirm.] > > = > > this code in the comparator implements last key on row: // compare row code here if (lcolumnlength =3D=3D 0 && ltype =3D=3D Type.Minimum.getCode()) { return 1; // left is bigger. } if (rcolumnlength =3D=3D 0 && rtype =3D=3D Type.Minimum.getCode()) { return -1; } // rest of comparator here If the right column has a length of 0 (ie: was constructed w/null family & = qualifier) _and_ has type of Minimum, then we say that the left is smaller = than the right, and vice versa, so in this code in HFile: int comp =3D this.reader.comparator.compare(key, offset, length, block.array(), block.arrayOffset() + block.position(), klen); The 'key' is the target key (the 'last on row' key). So we'd hit 'left is = bigger' branch, and we would iterate past the entire row until we get to th= e next row. - Ryan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/781/#review1350 ----------------------------------------------------------- On 2010-09-16 00:57:12, Pranav Khaitan wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.cloudera.org/r/781/ > ----------------------------------------------------------- > = > (Updated 2010-09-16 00:57:12) > = > = > Review request for hbase, stack, Jonathan Gray, Karthik Ranganathan, and = Kannan Muthukkaruppan. > = > = > Summary > ------- > = > Optimize reads for specific columns by reseeking between scans. Use the r= eseek logic to jump directly to next required column rather than reading cu= rrent column. > = > Big performance gain for queries with sparse columns. Not advantageous fo= r dense ones. Consider this before comitting. > = > Further suggestions/questions are welcome! > = > = > This addresses bugs HBASE-2450, HBASE-2916 and HBASE-2959. > http://issues.apache.org/jira/browse/HBASE-2450 > http://issues.apache.org/jira/browse/HBASE-2916 > http://issues.apache.org/jira/browse/HBASE-2959 > = > = > Diffs > ----- > = > trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java 990674 = > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 990674 = > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatch= er.java 990674 = > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.j= ava 990674 = > = > Diff: http://review.cloudera.org/r/781/diff > = > = > Testing > ------- > = > All existing tests pass and make significant use of this code. = > = > Added a new test file called TestColumnSeeking along with another patch a= t https://review.cloudera.org/r/780/. > = > = > Thanks, > = > Pranav > = >