hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
Date Wed, 17 Oct 2012 20:42:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478351#comment-13478351

Ted Yu commented on HBASE-4435:

I didn't find any test in the patch. It would be difficult for a feature to be accepted without
new tests.
Should GroupByStatsValues be named GroupByStats (since stats imply some values) ?

+ * Copyright 2012 The Apache Software Foundation
The above line is no longer needed in license header.

BigDecimalColumnInterpreter is covered in HBASE-6669. To make the workload reasonable for
this JIRA, you can exclude it from patch.
+public class CharacterColumnInterpreter implements ColumnInterpreter<Character, Character>
Add annotation for audience and stability for public classes.

In GroupByClient.java, the following import can be removed:
+import com.sun.istack.logging.Logger;
+    Map<Text, GroupByStatsValues<T, S>> getStats(
+      final byte[] tableName, final Scan scan, 
+      final List<byte [][]> groupByTuples, final byte[][] statsTuple, 
The @param for the above method doesn't match actual parameters - probably you changed API
in later iteration.
+    class RowNumCallback implements
The above class can be made private.
I think we should find a better name for the above class - it does aggregation.
+        long bt = System.currentTimeMillis();
Please use EnvironmentEdge instead.
+    table.close();
Please enclose the above in finally clause.
> Add Group By functionality using Coprocessors
> ---------------------------------------------
>                 Key: HBASE-4435
>                 URL: https://issues.apache.org/jira/browse/HBASE-4435
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors
>            Reporter: Nichole Treadway
>            Priority: Minor
>              Labels: by, coprocessors, group, hbase
>         Attachments: HBase-4435.patch, HBASE-4435-v2.patch
> Adds in a Group By -like functionality to HBase, using the Coprocessor framework. 
> It provides the ability to group the result set on one or more columns (groupBy families).
It computes statistics (max, min, sum, count, sum of squares, number missing) for a second
column, called the stats column. 
> To use, I've provided two implementations.
> 1. In the first, you specify a single group-by column and a stats field:
>       statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily,
statsQualifier, statsFieldColumnInterpreter);
> The result is a map with the Group By column value (as a String) to a GroupByStatsValues
object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.
> 2. The second implementation allows you to specify a list of group-by columns and a stats
field. The List of group-by columns is expected to contain lists of {column family, qualifier}
>       statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier,
> The GroupByStatsValues code is adapted from the Solr Stats component.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message