phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerald Sangudi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN
Date Mon, 30 Jul 2018 19:35:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gerald Sangudi updated PHOENIX-4751:
------------------------------------
    Attachment: 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch
                0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch
                0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch
                0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch
                0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch
                0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch
                0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch
                0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch
                0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch
                0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch
                0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch
                0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch
                0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch
                0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch
                0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch

> Support client-side hash aggregation with SORT_MERGE_JOIN
> ---------------------------------------------------------
>
>                 Key: PHOENIX-4751
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4751
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.14.0, 4.13.1
>            Reporter: Gerald Sangudi
>            Assignee: Gerald Sangudi
>            Priority: Major
>         Attachments: 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch, 0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch,
0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch, 0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch,
0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch,
0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch,
0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, 0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch,
0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch, 0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch,
0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch, 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch,
0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash aggregation in
some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash aggregation.
It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. The performance can be
improved if (a) the GROUP BY output does not need to be sorted, and (b) the GROUP BY input is
large enough and has low cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 4.13.1
that would benefit from hash aggregation if the GROUP BY input is large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  +-------------------------------------------------------------+----------------++------------------+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-------------------------------------------------------------+----------------++------------------+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-------------------------------------------------------------+----------------++------------------+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message