Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A49589611 for ; Wed, 16 May 2012 00:31:31 +0000 (UTC) Received: (qmail 54731 invoked by uid 500); 16 May 2012 00:31:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 54686 invoked by uid 500); 16 May 2012 00:31:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 54678 invoked by uid 99); 16 May 2012 00:31:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 00:31:29 +0000 X-ASF-Spam-Status: No, hits=2.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of anilgupta84@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-we0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 00:31:22 +0000 Received: by wefh52 with SMTP id h52so144084wef.14 for ; Tue, 15 May 2012 17:31:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=gP2x0OiMEBUbVVUapzJxKMSNvba8/dwaWg+7YLSUcb4=; b=OZGToZrwgiIqCCI/jloc8IsARAZLi/8QVD4nqAKG++NG1Jq6FF0DIX+mQUbCB21FWF 2gQ9/ibnT+X/bjd3mOPzHZXITlr/OyHk53f1bGKMUWzaRhn8aUCm1r10wa+1iZpRP0Cm AAbe5YZ3S6BG0ym9Z5f+RRhC0e0aCP+k34OxibOe/7WX5JNw9bWX8RV8T8bGGSyWanHQ 1irLimHDKMi1xtwfXTx0ssEQ49LsNowGg8hasY3WrgALLwWTXv/vn4M6Y0VSnwZsTkZi RkuhSuObElZjmsakRCel/R+a/EecHxuDs7/zQghUrDbhpOiDydcE07JeDZ1eHeIUbHe1 5fmQ== Received: by 10.180.74.68 with SMTP id r4mr2473977wiv.13.1337128262246; Tue, 15 May 2012 17:31:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.180.100.35 with HTTP; Tue, 15 May 2012 17:30:42 -0700 (PDT) In-Reply-To: References: From: anil gupta Date: Tue, 15 May 2012 17:30:42 -0700 Message-ID: Subject: Re: Coprocessor Aggregation supposed to be ~20x slower than Scans? To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d043d67e517209904c01c6f84 --f46d043d67e517209904c01c6f84 Content-Type: text/plain; charset=ISO-8859-1 Hi Ted, I looked into hbase-0.92.0-cdh4b1-20120206.193413-23-sources.jar and it also doesn't have it. On Tue, May 15, 2012 at 5:07 PM, Ted Yu wrote: > Why did you need to decompile ? > Here is the source code: > > > https://repository.cloudera.com/artifactory/public/org/apache/hbase/hbase/0.92.0-cdh4b1-SNAPSHOT/ > > On Tue, May 15, 2012 at 4:58 PM, anil gupta wrote: > > > Hi Ted, > > > > I decompiled the hbase-0.92.0-cdh4b1.jar using JD-GUI and in > > validateParameter method i don't find that condition. > > > > Thanks, > > Anil > > > > On Tue, May 15, 2012 at 1:37 PM, Ted Yu wrote: > > > > > I checked the code in Apache HBase 0.92 and trunk. I see the following > > line > > > in validateParameters(): > > > !Bytes.equals(scan.getStopRow(), > HConstants.EMPTY_END_ROW))) { > > > > > > Can you confirm that the bug is in cdh4b1 only ? > > > > > > Sorry for not doing the validation earlier. > > > > > > On Tue, May 15, 2012 at 12:09 PM, anil gupta > > > wrote: > > > > > > > Oh i c.. Now if i look closely at your gmail id then i can see your > > > name. I > > > > was totally confused. > > > > > > > > So, you want to force the user to specify stopRow if the filter is > not > > > > used? What if the user just wants to scan the table from startRow > till > > > the > > > > end of table? In your solution user will have explicitly set the > > stopRow > > > as > > > > HConstants.EMPTY_END_ROW. Do we really want to force this? > > > > > > > > As per your solution the code would look like this: > > > > if(scan.hasFilter()) > > > > { if (scan == null || (Bytes.equals(scan.getStartRow(), > > > > scan.getStopRow()) && !Bytes.equals(scan.getStartRow(), > > > > HConstants.EMPTY_START_ROW)) || (Bytes.compareTo(scan.getStartRow(), > > > > scan.getStopRow()) > 0 && > > > > !Bytes.equals(scan.getStopRow(), > HConstants.EMPTY_END_ROW) > > > )) { > > > > throw new IOException( > > > > "Agg client Exception: Startrow should be smaller than > > > Stoprow"); > > > > } else if (scan.getFamilyMap().size() != 1) { > > > > throw new IOException("There must be only one family."); > > > > } > > > > } > > > > else > > > > { if (scan == null || (Bytes.equals(scan.getStartRow(), > > > > scan.getStopRow()) && !Bytes.equals(scan.getStartRow(), > > > > HConstants.EMPTY_START_ROW)) || Bytes.compareTo(scan.getStartRow(), > > > > scan.getStopRow()) > 0) { > > > > throw new IOException( > > > > "Agg client Exception: Startrow should be smaller than > > > > Stoprow"); > > > > } else if (scan.getFamilyMap().size() != 1) { > > > > throw new IOException("There must be only one family."); > > > > } > > > > } > > > > > > > > Let me know your thoughts. > > > > > > > > Thanks, > > > > Anil > > > > > > > > > > > > On Tue, May 15, 2012 at 11:46 AM, Ted Yu > wrote: > > > > > > > > > Anil: > > > > > I am having trouble accessing JIRA. > > > > > > > > > > Ted Yu and Zhihong Yu are the same person :-) > > > > > > > > > > I think it would be good to remind user of aggregation client to > > narrow > > > > > range of scan. That's why I proposed adding check of hasFilter(). > > > > > > > > > > Cheers > > > > > > > > > > On Tue, May 15, 2012 at 10:47 AM, Ted Yu > > wrote: > > > > > > > > > > > Take your time. > > > > > > Once you complete your first submission, subsequent contributions > > > would > > > > > be > > > > > > easier. > > > > > > > > > > > > > > > > > > On Tue, May 15, 2012 at 10:34 AM, anil gupta < > > anilgupta84@gmail.com > > > > > >wrote: > > > > > > > > > > > >> Hi Ted, > > > > > >> > > > > > >> I created the jira: > > > https://issues.apache.org/jira/browse/HBASE-5999for > > > > > >> fixing this. > > > > > >> > > > > > >> Creating the patch might take me sometime(due to learning curve) > > as > > > > this > > > > > >> is > > > > > >> the first time i would be creating a patch. > > > > > >> > > > > > >> Thanks, > > > > > >> Anil Gupta > > > > > >> > > > > > >> > > > > > >> On Mon, May 14, 2012 at 4:00 PM, Ted Yu > > > wrote: > > > > > >> > > > > > >> > I was aware of the following change. > > > > > >> > > > > > > >> > Can you log a JIRA and attach the patch to it ? > > > > > >> > > > > > > >> > Thanks for trying out and improving aggregation client. > > > > > >> > > > > > > >> > On Mon, May 14, 2012 at 3:31 PM, anil gupta < > > > anilgupta84@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > >> > > Hi Ted, > > > > > >> > > > > > > > >> > > If we change the if statement condition in > validateParameters > > > > method > > > > > >> in > > > > > >> > > AggregationClient.java to: > > > > > >> > > if (scan == null || (Bytes.equals(scan.getStartRow(), > > > > > >> scan.getStopRow()) > > > > > >> > && > > > > > >> > > !Bytes.equals(scan.getStartRow(), > HConstants.EMPTY_START_ROW)) > > > || > > > > > >> > > (Bytes.compareTo(scan.getStartRow(), scan.getStopRow()) > 0 > && > > > > > >> > > *!Bytes.equals(scan.getStopRow(), > > > > > >> > > HConstants.EMPTY_END_ROW)* )) > > > > > >> > > > > > > > >> > > Condition specified in the bold and Italic will handle the > > case > > > > when > > > > > >> the > > > > > >> > > stopRow is not specified. IMHO, it's not an error if we are > > not > > > > > >> > specifying > > > > > >> > > the stopRow. This is what is was looking for because in my > > case > > > i > > > > > >> didnt > > > > > >> > > wanted to set the stop row as I am using a prefix filter. I > > have > > > > > >> tested > > > > > >> > the > > > > > >> > > above specified code and it works fine when i only specify > the > > > > > >> startRow. > > > > > >> > Is > > > > > >> > > this a desirable functionality? If yes, should this be added > > to > > > > > trunk? > > > > > >> > > > > > > > >> > > Here is the link for source of AggregationClient: > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.0/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java/?v=source > > > > > >> > > > > > > > >> > > Thanks, > > > > > >> > > Anil Gupta > > > > > >> > > > > > > > >> > > > > > > > >> > > On Mon, May 14, 2012 at 1:58 PM, Ted Yu < > yuzhihong@gmail.com> > > > > wrote > > > > > >> > > > > > > > >> > > > Anil: > > > > > >> > > > As code #3 shows, having stopRow helps narrow the range of > > > rows > > > > > >> > > > participating in aggregation. > > > > > >> > > > > > > > > >> > > > Do you have suggestion on how this process can be made > more > > > > > >> > > user-friendly ? > > > > > >> > > > > > > > > >> > > > Thanks > > > > > >> > > > > > > > > >> > > > On Mon, May 14, 2012 at 1:47 PM, anil gupta < > > > > > anilgupta84@gmail.com> > > > > > >> > > wrote: > > > > > >> > > > > > > > > >> > > > > HI Ted, > > > > > >> > > > > > > > > > >> > > > > My bad, i missed out a big difference between the Scan > > > object > > > > i > > > > > am > > > > > >> > > using > > > > > >> > > > in > > > > > >> > > > > my filter and Scan object used in coprocessors. So, scan > > > > object > > > > > is > > > > > >> > not > > > > > >> > > > > same. > > > > > >> > > > > Basically, i am doing filtering on the basis of a prefix > > of > > > > > >> RowKey. > > > > > >> > > > > > > > > > >> > > > > So, in my filter i do this to build scanner: > > > > > >> > > > > Code 1: > > > > > >> > > > > Filter filter = new > > PrefixFilter(Bytes.toBytes(strPrefix)); > > > > > >> > > > > Scan scan = new Scan(); > > > > > >> > > > > scan.setFilter(filter); > > > > > >> > > > > scan.setStartRow(Bytes.toBytes(strPrefix)); > // > > I > > > > dont > > > > > >> set > > > > > >> > > any > > > > > >> > > > > stopRow in this scanner. > > > > > >> > > > > > > > > > >> > > > > In coprocessor, i do the following for scanner: > > > > > >> > > > > Code 2: > > > > > >> > > > > Scan scan = new Scan(); > > > > > >> > > > > scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix))); > > > > > >> > > > > > > > > > >> > > > > I dont have startRow in above code because if i only > use > > > only > > > > > the > > > > > >> > > > startRow > > > > > >> > > > > in coprocessor scanner then i get the following > > > exception(due > > > > to > > > > > >> > this I > > > > > >> > > > > removed the startRow from CP scan object code): > > > > > >> > > > > java.io.IOException: Agg client Exception: Startrow > should > > > be > > > > > >> smaller > > > > > >> > > > than > > > > > >> > > > > Stoprow > > > > > >> > > > > at > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.coprocessor.AggregationClient.validateParameters(AggregationClient.java:116) > > > > > >> > > > > at > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.coprocessor.AggregationClient.max(AggregationClient.java:85) > > > > > >> > > > > at > > > > > >> > > > > > > > > > >> > > > > > > com.intuit.ihub.hbase.poc.DummyClass.doAggregation(DummyClass.java:81) > > > > > >> > > > > at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native > > > > > Method) > > > > > >> > > > > at > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > I modified the above code#2 to add the stopRow also: > > > > > >> > > > > Code 3: > > > > > >> > > > > Scan scan = new Scan(); > > > > > >> > > > > scan.setStartRow(Bytes.toBytes(prefix)); > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > scan.setStopRow(Bytes.toBytes(String.valueOf(Long.parseLong(prefix)+1))); > > > > > >> > > > > scan.setFilter(new > > > > PrefixFilter(Bytes.toBytes(prefix))); > > > > > >> > > > > > > > > > >> > > > > When, i run the coprocessor with Code #3, its blazing > > fast. > > > I > > > > > >> gives > > > > > >> > the > > > > > >> > > > > result in around 200 millisecond. :) > > > > > >> > > > > Since, this was just testing a coprocessors i added the > > > logic > > > > to > > > > > >> add > > > > > >> > > the > > > > > >> > > > > stopRow manually. What is the reason that Scan object in > > > > > >> coprocessor > > > > > >> > > > always > > > > > >> > > > > requires stopRow along with startRow?(code #1 works fine > > > even > > > > > >> when i > > > > > >> > > dont > > > > > >> > > > > use stopRow) Can this restriction be relaxed? > > > > > >> > > > > > > > > > >> > > > > Thanks, > > > > > >> > > > > Anil Gupta > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > On Mon, May 14, 2012 at 12:55 PM, Ted Yu < > > > yuzhihong@gmail.com > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > > > > >> > > > > > Anil: > > > > > >> > > > > > I think the performance was related to your custom > > filter. > > > > > >> > > > > > > > > > > >> > > > > > Please tell us more about the filter next time. > > > > > >> > > > > > > > > > > >> > > > > > Thanks > > > > > >> > > > > > > > > > > >> > > > > > On Mon, May 14, 2012 at 12:31 PM, anil gupta < > > > > > >> > anilgupta84@gmail.com> > > > > > >> > > > > > wrote: > > > > > >> > > > > > > > > > > >> > > > > > > HI Stack, > > > > > >> > > > > > > > > > > > >> > > > > > > I'll look into Gary Helming post and try to do > > profiling > > > > of > > > > > >> > > > coprocessor > > > > > >> > > > > > and > > > > > >> > > > > > > share the results. > > > > > >> > > > > > > > > > > > >> > > > > > > Thanks, > > > > > >> > > > > > > Anil Gupta > > > > > >> > > > > > > > > > > > >> > > > > > > On Mon, May 14, 2012 at 12:08 PM, Stack < > > > stack@duboce.net > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > > > > > > >> > > > > > > > On Mon, May 14, 2012 at 12:02 PM, anil gupta < > > > > > >> > > > anilgupta84@gmail.com> > > > > > >> > > > > > > > wrote: > > > > > >> > > > > > > > > I loaded around 70 thousand 1-2KB records in > > HBase. > > > > For > > > > > >> > scans, > > > > > >> > > > with > > > > > >> > > > > > my > > > > > >> > > > > > > > > custom filter i am able to get 97 rows in 500 > > > > > milliseconds > > > > > >> > and > > > > > >> > > > for > > > > > >> > > > > > > doing > > > > > >> > > > > > > > > sum, max, min(in built aggregations of HBase) on > > the > > > > > same > > > > > >> > > custom > > > > > >> > > > > > filter > > > > > >> > > > > > > > its > > > > > >> > > > > > > > > taking 11000 milliseconds. Does this mean that > > > > > >> coprocessors > > > > > >> > > > > > aggregation > > > > > >> > > > > > > > is > > > > > >> > > > > > > > > supposed to be around ~20x slower than scans? > Am i > > > > > missing > > > > > >> > any > > > > > >> > > > > trick > > > > > >> > > > > > > over > > > > > >> > > > > > > > > here? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > That seems like a high tax to pay for running CPs. > > > Can > > > > > you > > > > > >> dig > > > > > >> > > in > > > > > >> > > > on > > > > > >> > > > > > > > where the time is being spent? (See another > recent > > > note > > > > > on > > > > > >> > this > > > > > >> > > > list > > > > > >> > > > > > > > or on dev where Gary Helmling talks about how he > did > > > > basic > > > > > >> > > > profiling > > > > > >> > > > > > > > of CPs). > > > > > >> > > > > > > > St.Ack > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > -- > > > > > >> > > > > > > Thanks & Regards, > > > > > >> > > > > > > Anil Gupta > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > -- > > > > > >> > > > > Thanks & Regards, > > > > > >> > > > > Anil Gupta > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > -- > > > > > >> > > Thanks & Regards, > > > > > >> > > Anil Gupta > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> Thanks & Regards, > > > > > >> Anil Gupta > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks & Regards, > > > > Anil Gupta > > > > > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta --f46d043d67e517209904c01c6f84--