phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-3773) Implement FIRST_VALUES aggregate function
Date Wed, 31 May 2017 22:53:04 GMT


James Taylor commented on PHOENIX-3773:

Thanks for the updated patch, [~singamteja]. It's looking very good. Here's some feedback:

Instead of deserializing the byte arrays to objects (a relatively expensive operation) in
the evaluate method of FirstLastValueBaseClientAggregator, store the bytes pointer in a List<ImmutableBytesWritable>.
+                    if (getMultipleValues) {
+                        multipleValuesResult.add(PDataType.fromTypeId(dataType.getSqlType()
- PDataType.ARRAY_TYPE_BASE).toObject(;
+                        if (++counter == offset) {
+                            break;
+                        }
+                    } else {
Then, once you have all the values, combine them together using the PArrayDataType.appendItemToArray()
method outside the loop.
-            //not enought values to return Nth
+            if (getMultipleValues && multipleValuesResult.size() > 0) {
+                ptr.set(dataType.toBytes(PArrayDataType.instantiatePhoenixArray(PDataType.fromTypeId(dataType.getSqlType()
- PDataType.ARRAY_TYPE_BASE), multipleValuesResult.toArray())));
+                return true;
+            }
+            //not enough values to return Nth or top N'values or trying to retreive (N+1)th
I think the handling of the case in which there are no values may not be handled correctly.
You need to move this check to the top of the aggregate method:
        if (topValue == null) {
            return false;
And this would be the only case that would return false. Otherwise, if there aren't enough
values, do this:
    return true;
Probably a good idea to have a test that asks for the top 3 values when there are only 2 values
to make sure that case works too (if you don't have that already).

Also, one minor nit. How about instead of calling the new member variable getMultipleValues
you call it isArrayReturnType?

> Implement FIRST_VALUES aggregate function
> -----------------------------------------
>                 Key: PHOENIX-3773
>                 URL:
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: Loknath Priyatham Teja Singamsetty 
>              Labels: SFDC
>             Fix For: 4.11.0
>         Attachments: PHOENIX-3773_4.x-HBase-0.98.patch, PHOENIX-3773_master.patch, PHOENIX-3773.patch,
PHOENIX-3773.v2.patch, PHOENIX-3773.v3.patch
> Similar to FIRST_VALUE, but would allow the user to specify how many values to keep.
This could use a MinMaxPriorityQueue under the covers and be much more efficient than using
multiple NTH_VALUE calls to do the same like this:
> {code}
> SELECT entity_id,
>        NTH_VALUE(user_id,1) WITHIN GROUP (ORDER BY last_read_date DESC) as nth1_user_id,
>        NTH_VALUE(user_id,2) WITHIN GROUP (ORDER BY last_read_date DESC) as nth2_user_id,
>        NTH_VALUE(user_id,3) WITHIN GROUP (ORDER BY last_read_date DESC) as nth3_user_id,
>        count(*)
> WHERE tenant_id='00Dx0000000XXXX'
> AND entity_id in ('0D5x000000ABCD','0D5x000000ABCE')
> GROUP BY entity_id;
> {code}

This message was sent by Atlassian JIRA

View raw message