Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 59700 invoked from network); 8 Apr 2009 08:29:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2009 08:29:48 -0000 Received: (qmail 52931 invoked by uid 500); 8 Apr 2009 08:29:48 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 52903 invoked by uid 500); 8 Apr 2009 08:29:48 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 52893 invoked by uid 99); 8 Apr 2009 08:29:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2009 08:29:48 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lars@worldlingo.com designates 204.15.165.130 as permitted sender) Received: from [204.15.165.130] (HELO email.worldlingo.com) (204.15.165.130) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2009 08:29:38 +0000 Received: (qmail 6501 invoked from network); 8 Apr 2009 08:29:16 -0000 Received: from unknown (HELO [192.168.2.106]) (larsgeorge@[192.168.66.8]) (envelope-sender ) by email.worldlingo.com (qmail-ldap-1.03) with SMTP for ; 8 Apr 2009 08:29:16 -0000 Received: from [192.168.2.106] ([79.207.76.164] helo=[192.168.2.106]) by assp.worldlingo.com; 8 Apr 2009 01:29:14 -0700 Message-ID: <49DC605A.3030507@worldlingo.com> Date: Wed, 08 Apr 2009 10:29:14 +0200 From: Lars George User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org CC: Ninad Subject: Re: help with map-reduce References: <384813770904070226i3b997757o657df44da1353e55@mail.gmail.com> <78568af10904070235x463862a1qefcffe6db8c4faac@mail.gmail.com> <384813770904070250p5a3e4d6eh9bbf3e0bbc7c6345@mail.gmail.com> <49DB59B3.2070106@worldlingo.com> <384813770904070729s6fa8a891r1e520d556581ac59@mail.gmail.com> In-Reply-To: <384813770904070729s6fa8a891r1e520d556581ac59@mail.gmail.com> Content-Type: multipart/mixed; boundary="------------050400080405010007080609" X-Virus-Checked: Checked by ClamAV on apache.org --------------050400080405010007080609 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Rakhi, Wow, same here. I copied your RowFilter line and when I press the dot key and the fly up opens Eclipse hangs. Nice... NOT! Apart from that, you are also saying that the filter is not working as expected? Do you use any column qualifiers for the "Status:" column? Are the values in the correct casing, i.e. are the values stored in uppercase as you have it in your example below? I assume the comparison is byte sensitive. Please give us more details, maybe a small sample table dump so that we can test this? Lars Rakhi Khatwani wrote: > Hi, > I did try the filter... but using ColumnValueFilter. i declared a > ColumnValueFilter as follows: > > public class TableInputFilter extends TableInputFormat > implements JobConfigurable { > > public void configure(final JobConf jobConf) { > > setHtable(tablename); > > setInputColumns(columnName); > > > final RowFilterInterface colFilter = > new > ColumnValueFilter("Status:".getBytes(), ColumnValueFilter.CompareOp.EQUAL, > "UNCOLLECTED".getBytes()); > setRowFilter(colFilter); > } > > } > > and thn i use my class as the input format to my map function. > > > in my map function, i set my log to display the value of my Status Column > family. > > when i execute my map reduce function, it displays "Status:: Uncollected" > for some rows > and Status = "Collected" for rest of the rows. > > but what i want is to send only those records whose 'Status: is > uncollected'. > > i even considered using the method filterRow described by the API as > follows: > boolean *filterRow > *(SortedMap > > >> columns) >> > Filter on the fully assembled row. > > but as soon as i type colFilter followed by a '.', my eclipse hangs. > its really weird... i have tried it on 3 different machines (2 machines on > linux running eclipse gannymade 3.4 and one on windows using myEclipse). > > > i dunno if i am going wrong somewhere > > Thanks, > Raakhi > > > On Tue, Apr 7, 2009 at 7:18 PM, Lars George wrote: > > >> Hi Rakhi, >> >> The way the filters work is that you either use the supplied filters or >> create your own subclasses - but then you will have to deploy that class to >> all RegionServers while adding it to their respective hbase-env.sh (in the >> "export HBASE_CLASSPATH" variable). We are discussing currently if this >> could be done dynamically ( >> https://issues.apache.org/jira/browse/HBASE-1288). >> >> Once you have that done or use one of the supplied one then you can assign >> the filter by overriding the TableInputFormat's configure() method and >> assign it like so: >> >> public void configure(JobConf job) { >> RegExpRowFilter filter = new RegExpRowFilter("ABC.*"); >> setRowFilter(filter); >> } >> >> As Tim points out, setting the whole thing up is done in your main M/R tool >> based application, similar to: >> >> JobConf job = new JobConf(...); >> TableMapReduceUtil.initTableMapJob("", "", >> IdentityTableMap.class, >> ImmutableBytesWritable.class, RowResult.class, job); >> job.setReducerClass(MyTableReduce.class); >> job.setInputFormat(MyTableInputFormat.class); >> job.setOutputFormat(MyTableOutputFormat.class); >> >> Of course depending on what classes you want to replace or if this is a >> Reduce oriented job (means a default identity + filter map and all the work >> done in the Reduce phase) or the other way around. But the principles and >> filtering are the same. >> >> HTH, >> Lars >> >> >> >> Rakhi Khatwani wrote: >> >> >>> Thanks Ryan, i will try that >>> >>> On Tue, Apr 7, 2009 at 3:05 PM, Ryan Rawson wrote: >>> >>> >>> >>> >>>> there is a server-side mechanism to filter rows, it's found in the >>>> org.apache.hadoop.hbase.filter package. im not sure how this interops >>>> with >>>> the TableInputFormat exactly. >>>> >>>> setting a filter to reduce the # of rows returned is pretty much exactly >>>> what you want. >>>> >>>> On Tue, Apr 7, 2009 at 2:26 AM, Rakhi Khatwani >>> >>>> >>>> >>>>> wrote: >>>>> Hi, >>>>> i have a map reduce program with which i read from a hbase table. >>>>> In my map program i check if the column value of a is xxx, if yes then >>>>> continue with processing else skip it. >>>>> however if my table is really big, most of my time in the map gets >>>>> wasted >>>>> for processing unwanted rows. >>>>> is there any way through which we could send a subset of rows (based on >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> value of a particular column family) to the map??? >>>>> >>>>> i have also gone through TableInputFormatBase but am not able to figure >>>>> >>>>> >>>>> >>>> out >>>> >>>> >>>> >>>>> how do we set the input format if we are using TableMapReduceUtil class >>>>> >>>>> >>>>> >>>> to >>>> >>>> >>>> >>>>> initialize table map jobs. or is there any other way i could use it. >>>>> >>>>> Thanks in Advance, >>>>> Raakhi. >>>>> >>>>> >>>>> >>>>> >>> > > --------------050400080405010007080609--