hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nichole Treadway (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-4435) Add Group By functionality using Coprocessors
Date Mon, 19 Sep 2011 15:27:09 GMT

     [ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nichole Treadway updated HBASE-4435:
------------------------------------

    Description: 
Adds in a Group By -like functionality to HBase, using the Coprocessor framework. 

It provides the ability to group the result set on one or more columns (groupBy families).
It computes statistics (max, min, sum, count, sum of squares, number missing) for a second
column, called the stats column. 

To use, I've provided two implementations.

1. In the first, you specify a single group-by column and a stats field:

      statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily,
statsQualifier, statsFieldColumnInterpreter);

The result is a map with the Group By column value (as a String) to a GroupByStatsValues object.
The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.

2. The second implementation allows you to specify a list of group-by columns and a stats
field. The List of group-by columns is expected to contain lists of {column family, qualifier}
pairs. 

      statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier,
statsFieldColumnInterpreter);


The GroupByStatsValues code is adapted from the Solr Stats component.



  was:
Adds in a Group By -like functionality to HBase, using the Coprocessor framework. 

It provides the ability to group the result set on one or more columns (groupBy families).
It computes statistics (max, min, sum, count, sum of squares, number missing) for a second
column, called the stats column. 

To use, I've provided two implementations.

1. In the first, you specify a single group-by column and a stats field:

      statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily,
statsQualifier, statsFieldColumnInterpreter);

The result is a map with the Group By column value (as a String) to a GroupByStatsValues object.
The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.

2. The second implementation allows you to specify a list of group-by columns and a stats
field. The List of group-by columns is expected to contain lists of {column family, qualifier}
pairs. 

      statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier,
statsFieldColumnInterpreter);




> Add Group By functionality using Coprocessors
> ---------------------------------------------
>
>                 Key: HBASE-4435
>                 URL: https://issues.apache.org/jira/browse/HBASE-4435
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors
>            Reporter: Nichole Treadway
>            Priority: Minor
>         Attachments: HBase-4435.patch
>
>
> Adds in a Group By -like functionality to HBase, using the Coprocessor framework. 
> It provides the ability to group the result set on one or more columns (groupBy families).
It computes statistics (max, min, sum, count, sum of squares, number missing) for a second
column, called the stats column. 
> To use, I've provided two implementations.
> 1. In the first, you specify a single group-by column and a stats field:
>       statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily,
statsQualifier, statsFieldColumnInterpreter);
> The result is a map with the Group By column value (as a String) to a GroupByStatsValues
object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.
> 2. The second implementation allows you to specify a list of group-by columns and a stats
field. The List of group-by columns is expected to contain lists of {column family, qualifier}
pairs. 
>       statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier,
statsFieldColumnInterpreter);
> The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message