hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
Date Sat, 02 Apr 2011 01:12:06 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014908#comment-13014908

Himanshu Vashishtha commented on HBASE-1512:

Thanks for the suggestions Ted.

a) Added generics functionality to the AggregationClient. As suggested by Ted, there should
be a ColumnInterpreter thing to give the client a chance to describe the cell value type.
I made this thing generic, in the sense that now client is supposed to give the column interpreter
object along with the agg function calls. AggregationClient has such a implementation where
client says that its cell value is a long. Other cell values can be used with a similar approach.

b) While client can define the cell value type by implementing ColumnInterpreter,I still think
the average and Standard deviation will be a double value. So, I added a wrapper on these
methods to support the generic functionality. Please refer to AggreagationClient.getStdParams
& getAvgParams. Let me know if it is "un-intuitive". I think it is right though :)

c) Added a filter to each of the agg functions. They are just passed along with the call,
and are stuffed in the Scan object at the region level during scanning. In case of row count,
if client provides a filter, that one will be used. If neither a filter nor a qualifier is
provided, FirstKeyValueFilter is used.

d) Added more test cases for testing filter use cases (44 in total :)). 

e) refactored the "done" variable as suggested by Ted.

> Coprocessors: Support aggregate functions
> -----------------------------------------
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: coprocessors
>            Reporter: stack
>         Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java,
AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
> Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating
facility, facility generally where you want to calculate some meta info on your table, it
seems like it wouldn't be too hard making a filter type that could run a function server-side
and return the result ONLY of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server returns all
data to client and count is done by client counting up row keys.  A bunch of time and resources
have been wasted returning data that we're not interested in.  With this new filter type,
the counting would be done server-side and then it would make up a new result that was the
count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column
whose value is count of rows).   We could have it so the count was just done per region and
return that.  Or we could maybe make a small change in scanner too so that it aggregated the
per-region counts.  

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message