phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankit Singhal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3131) improve "order by " performance with aggregated query
Date Tue, 02 Aug 2016 11:27:20 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403791#comment-15403791
] 

Ankit Singhal commented on PHOENIX-3131:
----------------------------------------

[~tuyuri], Yeah,80k should not be much(as you have 4 regions only, so maximum you can get
80k*4 for merge sort too).
So you might be spending much time on server, can you confirm the time taken by below query(without
limit) , so that we can confirm how much time it take to read all and complete region.(And
then we can add sorting time(0.3sec *4) to this to come to the time of your slow query)
{code}
select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01') group
by url
{code}

Because,  when query is run with limit and group by on first primary key, we do optimization
that we read first region(if salt is not used) and all region(if salt is used) with number
of records equal to limit.
{code}
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS
{code}

against, complete region when limit is not specified

{code}
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS
{code}



> improve "order by " performance with aggregated query 
> ------------------------------------------------------
>
>                 Key: PHOENIX-3131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3131
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.8.0
>            Reporter: tu nguyen khac
>            Priority: Critical
>
> I created a table in phoenix with query : ( 4 node , ram 8gb, 4 cores / node ) 
> CREATE TABLE pageview_site (
>     url varchar(255) not null,
>     pageview bigint,
>     dt date not null,
>     CONSTRAINT PK PRIMARY KEY (url, dt ROW_TIMESTAMP)
> ) SALT_BUCKETS = 4;
> After that : 
> 1. I tried to upsert about : 13 milions rows to this table . 
> 2. Run 2 queries : 
>     a. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01')
group by url limit 100 offset 2;
> the duration this query  in about : 0.5 second
>     b. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01')
group by ur order by pv descl limit 100 offset 2;
> the duration this query  in about : 9.5 seconds
> what happens with 2nd query ?? I think we should improve performance for "order by "
command 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message