phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankit Singhal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3131) improve "order by " performance with aggregated query
Date Tue, 02 Aug 2016 08:05:20 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403575#comment-15403575
] 

Ankit Singhal commented on PHOENIX-3131:
----------------------------------------

[~tuyuri], how many distinct urls are there in a table. you can get it by running
{code}
select count(distinct(url)) from pageview_site where dt > to_date ('2016-06-01') ;
{code}

Because ,we fetch all the urls with partial aggregates from the regionserver and combine them
with merge sort and do the final sort on the aggregate values. And below final sort can take
time depending upon the number of distinct URLs. 
{code}
CLIENT TOP 102 ROWS SORTED BY [SUM(PAGEVIEW) DESC]
{code}

You may improve some performance by tuning below parameters(https://phoenix.apache.org/tuning.html).

{code}
phoenix.query.spoolThresholdBytes
phoenix.groupby.estimatedDistinctValues
{code}


> improve "order by " performance with aggregated query 
> ------------------------------------------------------
>
>                 Key: PHOENIX-3131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3131
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.8.0
>            Reporter: tu nguyen khac
>            Priority: Critical
>
> I created a table in phoenix with query : ( 4 node , ram 8gb, 4 cores / node ) 
> CREATE TABLE pageview_site (
>     url varchar(255) not null,
>     pageview bigint,
>     dt date not null,
>     CONSTRAINT PK PRIMARY KEY (url, dt ROW_TIMESTAMP)
> ) SALT_BUCKETS = 4;
> After that : 
> 1. I tried to upsert about : 13 milions rows to this table . 
> 2. Run 2 queries : 
>     a. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01')
group by url limit 100 offset 2;
> the duration this query  in about : 0.5 second
>     b. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01')
group by ur order by pv descl limit 100 offset 2;
> the duration this query  in about : 9.5 seconds
> what happens with 2nd query ?? I think we should improve performance for "order by "
command 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message