hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
Date Thu, 31 Mar 2011 04:36:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013826#comment-13013826

Himanshu Vashishtha commented on HBASE-1512:

Thanks for reviewing it Ted.

I will add the constructor. 

yes, I was thinking about this dependency of having a long variable for all these methods.
But flexibility of using any data type (by converting it to byte array) for even a specific
column family: column qualifier makes it a bit tricky to go for a data type argument. I can
have varying number of data types even for one CF:CQ combination. Rather I was considering
the option to have one additional check for int type (4 bytes). But that is just me, will
be great what others say on it.

For adding the type parameter to the AggregateCpProtocol methods, there will be dependency
with AggregationClient. Did you try adding it there too (apart from its impl).

> Coprocessors: Support aggregate functions
> -----------------------------------------
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: coprocessors
>            Reporter: stack
>         Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt
> Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating
facility, facility generally where you want to calculate some meta info on your table, it
seems like it wouldn't be too hard making a filter type that could run a function server-side
and return the result ONLY of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server returns all
data to client and count is done by client counting up row keys.  A bunch of time and resources
have been wasted returning data that we're not interested in.  With this new filter type,
the counting would be done server-side and then it would make up a new result that was the
count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column
whose value is count of rows).   We could have it so the count was just done per region and
return that.  Or we could maybe make a small change in scanner too so that it aggregated the
per-region counts.  

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message