Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 41593 invoked from network); 15 Apr 2011 20:44:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Apr 2011 20:44:47 -0000 Received: (qmail 35248 invoked by uid 500); 15 Apr 2011 20:44:47 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 35226 invoked by uid 500); 15 Apr 2011 20:44:47 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 35218 invoked by uid 99); 15 Apr 2011 20:44:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Apr 2011 20:44:47 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Apr 2011 20:44:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 28ED4A398A for ; Fri, 15 Apr 2011 20:44:06 +0000 (UTC) Date: Fri, 15 Apr 2011 20:44:06 +0000 (UTC) From: "jiraposter@reviews.apache.org (JIRA)" To: issues@hbase.apache.org Message-ID: <1876209231.61274.1302900246164.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020426#comment-13020426 ] jiraposter@reviews.apache.org commented on HBASE-1512: ------------------------------------------------------ bq. On 2011-04-15 20:21:01, Ted Yu wrote: bq. > /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 84 bq. > bq. > bq. > This is the first code review that evolves into a design session in my career - exciting. bq. > I think we should relax the initial assumption. I still think that I would go with one family, as the families are quite separate entities as such(HTable design wise), and I don't see any usage of doing aggregates on accumulated column families. If that is what is needed probably suggests some schema design rethinking. The point I raised was that the object we are now riding upon supports multiple families (which is very relevant for scanning a table), but we don't need it as per real usage. So, shall we support or not, this is the point of consideration. Moreover, as the requirements are evolving (and I guess they will continue to do so), it might change again. I am happy as long as it is moving in the right direction. - himanshu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review488 ----------------------------------------------------------- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. ------- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. ----- bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. ------- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. > Coprocessors: Support aggregate functions > ----------------------------------------- > > Key: HBASE-1512 > URL: https://issues.apache.org/jira/browse/HBASE-1512 > Project: HBase > Issue Type: Sub-task > Components: coprocessors > Reporter: stack > Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt > > > Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. > For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira