phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xinyi Yan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2988) Replace COUNT(DISTINCT...) with COUNT(...) when possible
Date Tue, 12 Feb 2019 06:14:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765712#comment-16765712
] 

Xinyi Yan commented on PHOENIX-2988:
------------------------------------

[~jamestaylor] do we have this implementation already? If not, I want to take this task, thanks.

> Replace COUNT(DISTINCT...) with COUNT(...) when possible
> --------------------------------------------------------
>
>                 Key: PHOENIX-2988
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2988
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Priority: Major
>
> An optimization that would really benefit the SELECT COUNT(DISTINCT pkCol) case: if there's
only a single COUNT(DISTINCT pkCol) and the GroupBy ends up being order preserving, you can
replace the COUNT(DISTINCT pkCol) with a COUNT(pkCol) in the SELECT, HAVING, and ORDER BY
clauses. That'll prevent the DistinctValueWithCountServerAggregator from being used which
keeps a Map of all unique values and instead just keep a single overall count, which is all
we need thanks to your DistinctPrefixFilter.
> A few considerations in the implementation:
> * Pass through select in the call to groupBy.compile() in QueryCompiler and change the
return type to return a new select (as the SELECT, HAVING, and ORDER BY may have been rewritten).
Probably easiest if the GroupBy object is just mutated in place.
> * Within the groupBy.compile() call, use a visitor on the SELECT, HAVING and ORDER BY
clauses to do the rewriting. You can do that by deriving a class from ParseNodeRewriter, overriding
the {{visitLeave(final FunctionParseNode node, List<ParseNode> nodes)}} method to return
a new COUNT parse node with the {{nodes}} passed in as children if {{node}} equals the DistinctCountParseNode
that you replaced in the select statement.
> * The compilation of the HAVING clause should be moved after the call to groupBy compile
in QueryCompiler, like this since it may have been rewritten in the groupBy.compile call:
> {code}
>         select = groupBy.compile(context, select, innerPlanTupleProjector);
>         Expression having = HavingCompiler.compile(context, select, groupBy);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message