phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chenglei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving
Date Sun, 15 Apr 2018 01:27:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

chenglei updated PHOENIX-4690:
------------------------------
    Description: 
Given a table :
{code}
 create table test ( 
       pk1 integer not null , 
       pk2 integer not null, 
       v integer, 
       CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
{code}

and some data:
{code}
+------+------+-----+
| PK1  | PK2  |  V  |
+------+------+-----+
| 1    | 8    | 10  |
| 1    | 9    | 11  |
| 2    | 3    | 13  |
| 2    | 7    | 15  |
| 3    | 2    | 17  |
+------+------+-----+
{code}

for following sql :
{code}
    select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
{code}

the expected result is :
{code}
+------+------+-----------+
| PK2  | PK1  | COUNT(V)  |
+------+------+-----------+
| 2    | 3    | 1         |
| 3    | 2    | 1         |
| 7    | 2    | 1         |
| 8    | 1    | 1         |
| 9    | 1    | 1         |
+------+------+-----------+
{code}

but the actual result is :
{code}
+------+------+-----------+
| PK2  | PK1  | COUNT(V)  |
+------+------+-----------+
| 8    | 1    | 1         |
| 9    | 1    | 1         |
| 3    | 2    | 1         |
| 7    | 2    | 1         |
| 2    | 3    | 1         |
+------+------+-----------+
{code}

The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for {{group by pk2,pk1}},
{{OrderPreservingTracker.isOrderPreserving}} is true by reorder the {{pk2,pk1}} to {{pk1,pk2}}
following the order of PK columns {{pk1,pk2}},  but in line 158, for the new GroupBy, the
GroupBy.expressions is still {{pk2,pk1}},not the actual {{pk1,pk2}}:

{code}

141       public GroupBy compile(StatementContext context, TupleProjector tupleProjector)
throws SQLException {
142              boolean isOrderPreserving = this.isOrderPreserving;
143              int orderPreservingColumnCount = 0;
144              if (isOrderPreserving) {
145                    OrderPreservingTracker tracker = new OrderPreservingTracker(context,
GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, expressions.size(), tupleProjector);
146              for (int i = 0; i < expressions.size(); i++) {
147                    Expression expression = expressions.get(i);
148                    tracker.track(expression);
149                }
150                
151                // This is true if the GROUP BY is composed of only PK columns. We further
check here that
152                // there are no "gaps" in the PK columns positions used (i.e. we start
with the first PK
153                // column and use each subsequent one in PK order).
154                isOrderPreserving = tracker.isOrderPreserving();
155                orderPreservingColumnCount = tracker.getOrderPreservingColumnCount();
156            }
157            if (isOrderPreserving || isUngroupedAggregate) {
158                return new GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
.setOrderPreservingColumnCount(orderPreservingColumnCount).build();
159     }

{code}

Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, because {{the GroupBy.isOrderPreserving}}
is true, and {{order by pk2,pk1}} is consistent with the GroupBy.expressions {{group by pk2,pk1}}
created in above  {{GroupBy.compile}} method, so the result of  {{OrderByCompiler.compile}}
is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.

But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so we need to execute
 {{order by pk2,pk1}} after the {{group by pk1,pk2}}.

  was:
Given a table :
{code}
 create table test ( 
       pk1 integer not null , 
       pk2 integer not null, 
       v integer, 
       CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
{code}

and some data:
{code}
+------+------+-----+
| PK1  | PK2  |  V  |
+------+------+-----+
| 1    | 8    | 10  |
| 1    | 9    | 11  |
| 2    | 3    | 13  |
| 2    | 7    | 15  |
| 3    | 2    | 17  |
+------+------+-----+
{code}

for following sql :
{code}
    select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
{code}

the expected result is :
{code}
+------+------+-----------+
| PK2  | PK1  | COUNT(V)  |
+------+------+-----------+
| 2    | 3    | 1         |
| 3    | 2    | 1         |
| 7    | 2    | 1         |
| 8    | 1    | 1         |
| 9    | 1    | 1         |
+------+------+-----------+
{code}

but the actual result is :
{code}
+------+------+-----------+
| PK2  | PK1  | COUNT(V)  |
+------+------+-----------+
| 8    | 1    | 1         |
| 9    | 1    | 1         |
| 3    | 2    | 1         |
| 7    | 2    | 1         |
| 2    | 3    | 1         |
+------+------+-----------+
{code}

The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for {{group by pk2,pk1}},
{{OrderPreservingTracker.isOrderPreserving}} is true by reorder the {{pk2,pk1}} to {{pk1,pk2}}
following the order of PK columns {{pk1,pk2}},  but in line 158, for the new GroupBy, the
GroupBy.expressions is still {{pk2,pk1}},not the actual {{pk1,pk2}}:

{code}

141       public GroupBy compile(StatementContext context, TupleProjector tupleProjector)
throws SQLException {
142              boolean isOrderPreserving = this.isOrderPreserving;
143              int orderPreservingColumnCount = 0;
144              if (isOrderPreserving) {
145                    OrderPreservingTracker tracker = new OrderPreservingTracker(context,
GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, expressions.size(), tupleProjector);
146              for (int i = 0; i < expressions.size(); i++) {
147                    Expression expression = expressions.get(i);
148                    tracker.track(expression);
149                }
150                
151                // This is true if the GROUP BY is composed of only PK columns. We further
check here that
152                // there are no "gaps" in the PK columns positions used (i.e. we start
with the first PK
153                // column and use each subsequent one in PK order).
154                isOrderPreserving = tracker.isOrderPreserving();
155                orderPreservingColumnCount = tracker.getOrderPreservingColumnCount();
156            }
157            if (isOrderPreserving || isUngroupedAggregate) {
158                return new GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
.setOrderPreservingColumnCount(orderPreservingColumnCount).build();
159     }

{code}

Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, because {{the GroupBy.isOrderPreserving}}
is true, and {{order by pk2,pk1}} is consistent with the GroupBy.expressions {{group by pk2,pk1}}
created in above  {{GroupBy.compile}} method, so the result of  {{OrderByCompiler.compile}}
is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.

But in fact,because the actual GroupBy.expression is {{group by pk1,pk2}},so we need to execute
 {{order by pk2,pk1}} after
the {{group by pk1,pk2}}.


> GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving
> ---------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4690
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4690
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.13.2
>            Reporter: chenglei
>            Priority: Critical
>         Attachments: PHOENIX-4690_v1.patch
>
>
> Given a table :
> {code}
>  create table test ( 
>        pk1 integer not null , 
>        pk2 integer not null, 
>        v integer, 
>        CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
> {code}
> and some data:
> {code}
> +------+------+-----+
> | PK1  | PK2  |  V  |
> +------+------+-----+
> | 1    | 8    | 10  |
> | 1    | 9    | 11  |
> | 2    | 3    | 13  |
> | 2    | 7    | 15  |
> | 3    | 2    | 17  |
> +------+------+-----+
> {code}
> for following sql :
> {code}
>     select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
> {code}
> the expected result is :
> {code}
> +------+------+-----------+
> | PK2  | PK1  | COUNT(V)  |
> +------+------+-----------+
> | 2    | 3    | 1         |
> | 3    | 2    | 1         |
> | 7    | 2    | 1         |
> | 8    | 1    | 1         |
> | 9    | 1    | 1         |
> +------+------+-----------+
> {code}
> but the actual result is :
> {code}
> +------+------+-----------+
> | PK2  | PK1  | COUNT(V)  |
> +------+------+-----------+
> | 8    | 1    | 1         |
> | 9    | 1    | 1         |
> | 3    | 2    | 1         |
> | 7    | 2    | 1         |
> | 2    | 3    | 1         |
> +------+------+-----------+
> {code}
> The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for {{group
by pk2,pk1}},
> {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the {{pk2,pk1}} to {{pk1,pk2}}
following the order of PK columns {{pk1,pk2}},  but in line 158, for the new GroupBy, the
GroupBy.expressions is still {{pk2,pk1}},not the actual {{pk1,pk2}}:
> {code}
> 141       public GroupBy compile(StatementContext context, TupleProjector tupleProjector)
throws SQLException {
> 142              boolean isOrderPreserving = this.isOrderPreserving;
> 143              int orderPreservingColumnCount = 0;
> 144              if (isOrderPreserving) {
> 145                    OrderPreservingTracker tracker = new OrderPreservingTracker(context,
GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, expressions.size(), tupleProjector);
> 146              for (int i = 0; i < expressions.size(); i++) {
> 147                    Expression expression = expressions.get(i);
> 148                    tracker.track(expression);
> 149                }
> 150                
> 151                // This is true if the GROUP BY is composed of only PK columns. We
further check here that
> 152                // there are no "gaps" in the PK columns positions used (i.e. we start
with the first PK
> 153                // column and use each subsequent one in PK order).
> 154                isOrderPreserving = tracker.isOrderPreserving();
> 155                orderPreservingColumnCount = tracker.getOrderPreservingColumnCount();
> 156            }
> 157            if (isOrderPreserving || isUngroupedAggregate) {
> 158                return new GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
> .setOrderPreservingColumnCount(orderPreservingColumnCount).build();
> 159     }
> {code}
> Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, because {{the
GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} is consistent with the GroupBy.expressions
{{group by pk2,pk1}} created in above  {{GroupBy.compile}} method, so the result of  {{OrderByCompiler.compile}}
is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.
> But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so we need
to execute  {{order by pk2,pk1}} after the {{group by pk1,pk2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message