From dev-return-59340-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Sat Dec 21 00:57:17 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8C48518064C for ; Sat, 21 Dec 2019 01:57:17 +0100 (CET) Received: (qmail 88414 invoked by uid 500); 21 Dec 2019 00:57:09 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 88195 invoked by uid 99); 21 Dec 2019 00:57:08 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Dec 2019 00:57:08 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 94871E00FF for ; Sat, 21 Dec 2019 00:57:07 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id E116478237A for ; Sat, 21 Dec 2019 00:57:05 +0000 (UTC) Date: Sat, 21 Dec 2019 00:57:05 +0000 (UTC) From: "Chinmay Kulkarni (Jira)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-4751?page=3Dcom.atlass= ian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinmay Kulkarni closed PHOENIX-4751. ------------------------------------- Bulk closing Jiras for the 4.15.0 release. > Support client-side hash aggregation with SORT_MERGE_JOIN > --------------------------------------------------------- > > Key: PHOENIX-4751 > URL: https://issues.apache.org/jira/browse/PHOENIX-4751 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.14.0, 4.13.1 > Reporter: Gerald Sangudi > Assignee: Gerald Sangudi > Priority: Major > Fix For: 4.15.0, 5.1.0 > > Attachments: 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-= 1.4.patch, 0001-PHOENIX-4751-Implement-client-side-has.4.x-HBase-1.4.patch,= 0001-PHOENIX-4751-Implement-client-side-hash-aggre.master.patch, 0002-PHOE= NIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch, 0003-PHOENIX-4751-G= enerated-aggregated-resu.4.x-HBase-1.4.patch, 0004-PHOENIX-4751-Sort-result= s-of-client-ha.4.x-HBase-1.4.patch, 0005-PHOENIX-4751-Add-integration-test-= for-.4.x-HBase-1.4.patch, 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-H= Base-1.4.patch, 0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.p= atch, 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch, 0009= -PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, 0010-PHOENIX-4= 751-Abort-when-client-aggrega.4.x-HBase-1.4.patch, 0011-PHOENIX-4751-Use-Ph= oenix-memory-mgmt-t.4.x-HBase-1.4.patch, 0012-PHOENIX-4751-Remove-extra-mem= ory-limit.4.x-HBase-1.4.patch, 0013-PHOENIX-4751-Sort-only-when-necessary.4= .x-HBase-1.4.patch, 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1= .4.patch, 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch, = 0016-PHOENIX-4751-Handle-reverse-sort-add-c.4.x-HBase-1.4.patch > > > A GROUP BY that follows a SORT_MERGE_JOIN=C2=A0should be able to use hash= aggregation in some cases, for improved performance. > When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash= aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREG= ATE. The=C2=A0performance can be improved if (a) the GROUP BY output does n= ot need to be sorted, and (b) the GROUP BY input=C2=A0is large enough and h= as low cardinality. > The hash aggregation can initially be a hint. Here is an example from Pho= enix 4.13.1 that would benefit from hash aggregation if the GROUP BY input = is large with low cardinality. > CREATE TABLE unsalted ( > keyA BIGINT NOT NULL, > keyB BIGINT NOT NULL, > val SMALLINT, > CONSTRAINT pk PRIMARY KEY (keyA, keyB) > ); > EXPLAIN > SELECT /*+ USE_SORT_MERGE_JOIN */=20 > t1.val v1, t2.val v2, COUNT(\*) c=20 > FROM unsalted t1 JOIN unsalted t2=20 > ON (t1.keyA =3D t2.keyA)=20 > GROUP BY t1.val, t2.val; > +-------------------------------------------------------------+---------= -------++------------------+ > |PLAN|EST_BYTES_READ|EST_ROWS_READ|=C2=A0| > +-------------------------------------------------------------+----------= ------++------------------+ > |SORT-MERGE-JOIN (INNER) TABLES|null|null|=C2=A0| > |=C2=A0 =C2=A0 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null= |null|=C2=A0| > |AND|null|null|=C2=A0| > |=C2=A0 =C2=A0 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null= |null|=C2=A0| > |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null|=C2=A0| > |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null|=C2=A0= | > +-------------------------------------------------------------+----------= ------++------------------+ -- This message was sent by Atlassian Jira (v8.3.4#803005)