hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
Date Wed, 17 Oct 2012 20:42:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478351#comment-13478351
] 

Ted Yu commented on HBASE-4435:
-------------------------------

I didn't find any test in the patch. It would be difficult for a feature to be accepted without
new tests.
Should GroupByStatsValues be named GroupByStats (since stats imply some values) ?

{code}
+ * Copyright 2012 The Apache Software Foundation
{code}
The above line is no longer needed in license header.

BigDecimalColumnInterpreter is covered in HBASE-6669. To make the workload reasonable for
this JIRA, you can exclude it from patch.
{code}
+public class CharacterColumnInterpreter implements ColumnInterpreter<Character, Character>
{
{code}
Add annotation for audience and stability for public classes.

In GroupByClient.java, the following import can be removed:
{code}
+import com.sun.istack.logging.Logger;
{code}
{code}
+    Map<Text, GroupByStatsValues<T, S>> getStats(
+      final byte[] tableName, final Scan scan, 
+      final List<byte [][]> groupByTuples, final byte[][] statsTuple, 
{code}
The @param for the above method doesn't match actual parameters - probably you changed API
in later iteration.
{code}
+    class RowNumCallback implements
{code}
The above class can be made private.
I think we should find a better name for the above class - it does aggregation.
{code}
+        long bt = System.currentTimeMillis();
{code}
Please use EnvironmentEdge instead.
{code}
+    table.close();
{code}
Please enclose the above in finally clause.
                
> Add Group By functionality using Coprocessors
> ---------------------------------------------
>
>                 Key: HBASE-4435
>                 URL: https://issues.apache.org/jira/browse/HBASE-4435
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors
>            Reporter: Nichole Treadway
>            Priority: Minor
>              Labels: by, coprocessors, group, hbase
>         Attachments: HBase-4435.patch, HBASE-4435-v2.patch
>
>
> Adds in a Group By -like functionality to HBase, using the Coprocessor framework. 
> It provides the ability to group the result set on one or more columns (groupBy families).
It computes statistics (max, min, sum, count, sum of squares, number missing) for a second
column, called the stats column. 
> To use, I've provided two implementations.
> 1. In the first, you specify a single group-by column and a stats field:
>       statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily,
statsQualifier, statsFieldColumnInterpreter);
> The result is a map with the Group By column value (as a String) to a GroupByStatsValues
object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.
> 2. The second implementation allows you to specify a list of group-by columns and a stats
field. The List of group-by columns is expected to contain lists of {column family, qualifier}
pairs. 
>       statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier,
statsFieldColumnInterpreter);
> The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message