hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
Date Sat, 16 Apr 2011 00:29:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020506#comment-13020506
] 

jiraposter@reviews.apache.org commented on HBASE-1512:
------------------------------------------------------



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
line 84
bq.  > <https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84>
bq.  >
bq.  >     I think the startKey and endKey can be optional as well.
bq.  >     Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.      These start-end keys are used to locate the interested regions. Do you mean whole
_table_? If so, it will be like setting HConstants.START_ROW/STOP_ROW which are essentially
empty byte arrays.
bq.  
bq.  Gary Helmling wrote:
bq.      This would be a bigger change, but maybe it would make sense to have the client pass
a Scan object?  Then you could specify start/end row, time range, multiple column qualifiers,
filter?
bq.      
bq.      It's starting to look like we're duplicating most of these arguments when there's
already a good way of passing them.  What do you think?
bq.  
bq.  himanshu vashishtha wrote:
bq.      Yes, am wondering why it didn't occur to me before! As a matter of fact, we are creating
a Scan object at region level. So, with passing the Scan object to the Aggregation client,
it will call the appropriate HTable method (the existing one), but the CP's method will take
the Scan object as a parameter, and let the client have its liberty. But it needs some code
changes, like in validation stuff for one. 
bq.      (I was thinking that it was good to go and now there is so much room for improvement.
Good stuff).
bq.  
bq.  himanshu vashishtha wrote:
bq.      In continuation of what I earlier said, in the current design we assume that client
is interested in one family only. Shall this needs to be change too. 
bq.      I am refactoring these methods to let the client pass a Scan object to the AggregationClient
class, but a scan object as such can have multi families in it. Shall we need to change this
assumption. I don't see any issue with it as such, but this is something I didn't plan originally
and it needs change in test cases. Please comment.

I refactored a agg method 1512 as per today's review (using scan object plus others) and its
working fine (test passes for the method that i change). May be I need to add more boundary
conditions to test the scan object. 
I have some stuff for tonight/tomorrow, so will complete this by tomorrow night or by Sunday.
I hope that should be ok(?)


- himanshu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
-----------------------------------------------------------


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch provides reference implementation for aggregate function support through Coprocessor
framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte array is interpreted.
bq.  Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and some of the
code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.      https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION

bq.    /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
PRE-CREATION 
bq.    /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION

bq.    /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION

bq.    /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION

bq.    /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION

bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: coprocessors
>            Reporter: stack
>         Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java,
AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt,
patch-1512-5.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating
facility, facility generally where you want to calculate some meta info on your table, it
seems like it wouldn't be too hard making a filter type that could run a function server-side
and return the result ONLY of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server returns all
data to client and count is done by client counting up row keys.  A bunch of time and resources
have been wasted returning data that we're not interested in.  With this new filter type,
the counting would be done server-side and then it would make up a new result that was the
count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column
whose value is count of rows).   We could have it so the count was just done per region and
return that.  Or we could maybe make a small change in scanner too so that it aggregated the
per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message