Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 1371 invoked from network); 27 Jun 2007 18:14:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Jun 2007 18:14:49 -0000 Received: (qmail 27126 invoked by uid 500); 27 Jun 2007 18:14:51 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 27093 invoked by uid 500); 27 Jun 2007 18:14:51 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 27034 invoked by uid 99); 27 Jun 2007 18:14:51 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jun 2007 11:14:51 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jun 2007 11:14:46 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 613EB7141EE for ; Wed, 27 Jun 2007 11:14:26 -0700 (PDT) Message-ID: <19931484.1182968066395.JavaMail.jira@brutus> Date: Wed, 27 Jun 2007 11:14:26 -0700 (PDT) From: "Jim Kellerman (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner In-Reply-To: <23328532.1182889225856.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] Jim Kellerman commented on HADOOP-1531: --------------------------------------- In general, I have no objections to this change. However, I do have a couple of comments: - in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter. - in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example: try { if(this.filter == null) { this.scannerId = this.server.openScanner(info.regionInfo.regionName, this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW); } else { this.scannerId = this.server.openScanner(info.regionInfo.regionName, this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter); } - finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions. > Add RowFilter to HRegion.HScanner > --------------------------------- > > Key: HADOOP-1531 > URL: https://issues.apache.org/jira/browse/HADOOP-1531 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Affects Versions: 0.14.0 > Reporter: James Kennedy > Assignee: James Kennedy > Attachments: RowFilter.patch > > > I've implemented a RowFilterInterface and a RowFilter implementation. This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter. > HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter. The filter applies criteria based on row keys and/or column data values. > Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES. Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null. > The initial RowFilter implementation is limited in several ways: > * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison. > * Multiple column criteria are treated as an implicit conjunction, no disjunction possible. > * row key criteria is a regular expression only > * row key criteria is independent of column criteria. No "if rowkey.matches(A) and col1==B" although the interface is created to allow for that. > But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.