Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 70537 invoked from network); 11 Dec 2008 03:06:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Dec 2008 03:06:52 -0000 Received: (qmail 3489 invoked by uid 500); 11 Dec 2008 03:07:03 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 3470 invoked by uid 500); 11 Dec 2008 03:07:03 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 3459 invoked by uid 99); 11 Dec 2008 03:07:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Dec 2008 19:07:03 -0800 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=DNS_FROM_OPENWHOIS,FORGED_YAHOO_RCVD,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2008 03:06:46 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1LAbsg-00066Z-3X for hbase-user@hadoop.apache.org; Wed, 10 Dec 2008 19:06:26 -0800 Message-ID: <20948685.post@talk.nabble.com> Date: Wed, 10 Dec 2008 19:06:26 -0800 (PST) From: tigertail To: hbase-user@hadoop.apache.org Subject: Re: map reduce range of records from hbase table In-Reply-To: <839ba01c0810082250p6a529a03w7675525d3f9afe8c@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: tyczjs@yahoo.com References: <839ba01c0810080033m73f85aa8sddc736aecf69ed90@mail.gmail.com> <48ED2295.3000505@duboce.net> <839ba01c0810082250p6a529a03w7675525d3f9afe8c@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi Cedric, Can you share your version of getSplits to feed only a subset of records to me? I expect your method can select the subset based on row keys as well as some column values. Thank you. Cedric Ho wrote: > > Thanks for the solutions, I've tried overriding getSplits and it does > what I need. > > But for the RowFilter, I guess it would also need to scan through all > records and do filtering. So wouldn't it be the same if I do the > filtering myself during the map phrase? > > Cedric > > > On Thu, Oct 9, 2008 at 5:13 AM, stack wrote: >> Cedric Ho wrote: >>> >>> Hi all, >>> >>> I am using 0.18.0 and have successfully used data from hbase table as >>> input to my map/reduce job. >>> >>> I wonder how to specify a subset of records from a table instead of >>> taking all records as input. >>> Such as a range of the row keys or maybe by specific values of certain >>> columns. >>> >> >> You'll have to subclass the TableInputFormat. >> >> There is an example in the javadoc on subclassing TIF: >> http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html >> (Sorry, the example is mangled. Do a get of the html source to see >> non-garbled code). >> >> The example shows you how to set a filter. Filters can filter on rows >> and >> values. >> >> To work against a subset, you'd probably need to play with getSplits in >> your subclass. Default, it basically eretrns as many splits as there >> are >> regions in your table, so its the whole table always. Filters could stop >> unwanted rows being returned but maybe its better if the rows weren't >> considered in the first place; hence the need of getSplits subclassing. >> >> St.Ack >> >> > > -- View this message in context: http://www.nabble.com/map-reduce-range-of-records-from-hbase-table-tp19873787p20948685.html Sent from the HBase User mailing list archive at Nabble.com.