From dev-return-56340-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org  Fri May  3 20:12:03 2019
Return-Path: <dev-return-56340-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 4C10818067E
	for <archive-asf-public@cust-asf.ponee.io>; Fri,  3 May 2019 22:12:03 +0200 (CEST)
Received: (qmail 66211 invoked by uid 500); 3 May 2019 20:12:01 -0000
Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@phoenix.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@phoenix.apache.org>
List-Post: <mailto:dev@phoenix.apache.org>
List-Id: <dev.phoenix.apache.org>
Reply-To: dev@phoenix.apache.org
Delivered-To: mailing list dev@phoenix.apache.org
Received: (qmail 66180 invoked by uid 99); 3 May 2019 20:12:01 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 May 2019 20:12:01 +0000
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B613BE2AF3
	for <dev@phoenix.apache.org>; Fri,  3 May 2019 20:12:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 74F7E25815
	for <dev@phoenix.apache.org>; Fri,  3 May 2019 20:12:00 +0000 (UTC)
Date: Fri, 3 May 2019 20:12:00 +0000 (UTC)
From: "Bin Shi (JIRA)" <jira@apache.org>
To: dev@phoenix.apache.org
Message-ID: <JIRA.13187369.1537896007000.195561.1556914320475@Atlassian.JIRA>
In-Reply-To: <JIRA.13187369.1537896007000@Atlassian.JIRA>
References: <JIRA.13187369.1537896007000@Atlassian.JIRA> <JIRA.13187369.1537896007968@jira-lw-us.apache.org>
Subject: [jira] [Updated] (PHOENIX-4925) Use a Variant Segment tree to
 organize Guide Post Info.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


     [ https://issues.apache.org/jira/browse/PHOENIX-4925?page=3Dcom.atlass=
ian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bin Shi updated PHOENIX-4925:
-----------------------------
    Summary: Use a Variant Segment tree to organize Guide Post Info.  (was:=
 Use a Variant Segment tree to organize Guide Post Info)

> Use a Variant Segment tree to organize Guide Post Info.
> -------------------------------------------------------
>
>                 Key: PHOENIX-4925
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4925
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Bin Shi
>            Assignee: Bin Shi
>            Priority: Major
>          Time Spent: 6h
>  Remaining Estimate: 0h
>
> As reported, Query compilation (for the sample queries showed below), esp=
ecially deriving estimation and generating parallel scans from guide posts,=
 becomes much slower after we introduced Phoenix Stats.=C2=A0
>  a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c
>  b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c =3D =E2=80=98x=
=E2=80=99 ORDER BY Pk1__c
>  c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c =3D =E2=80=98x=E2=
=80=99 ORDER BY pk1__c,pk2__c
>  d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c =3D =E2=80=98x=E2=
=80=99 AND nonpk1__c ORDER BY pk1__c,pk2__c
>  e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >=3D 'd' AND pk__c=
 <=3D 'm' OR pk__c >=3D 'o' AND pk__c <=3D 'x' ORDER BY pk__c // pk__c is t=
he only column to make the primary key.
>  =C2=A0
>  By using prefix encoding for guide post info, we have to decode and trav=
erse guide posts sequentially, which causes time complexity in BaseResultIt=
erators.getParallelScan(...) to be O( n ) , where n is the total count of g=
uide posts.
> According to PHOENIX-2417, to reduce footprint in client cache and over t=
ransmition, the prefix encoding is used as in-memory and over-the-wire enco=
ding for guide post info.
> We can use Segment Tree to address both memory and performance concerns. =
The guide posts are partitioned to k=C2=A0chunks (k=3D1024?), each=C2=A0chu=
nk is encoded by prefix encoding and the encoded data is a leaf node of the=
 tree. The inner node contains summary info (the count of rows, the data si=
ze) of the sub tree rooted at the inner node.
> With this tree like data structure, compared to the current data structur=
e, the increased size (mainly coming from the n/k-1 inner nodes) is ignorab=
le. The time complexity for queries a, b, c can be reduced to O(m) where m =
is the total count of regions; the time complexity for "EXPLAN" queries a, =
b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)=
", it can even be reduced to O(1). For queries d and e, the time complexity=
 to find the start of target scan ranges can be reduced to O(log(n/k)).
> The tree can also integrate AVL and B+ characteristics to support partial=
 load/unload when interacting with stats client cache.
> =C2=A0


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)