phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2606) Cursor support in Phoenix
Date Thu, 18 Feb 2016 09:10:18 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151990#comment-15151990
] 

James Taylor commented on PHOENIX-2606:
---------------------------------------

We currently do a sort per scan on the client-side for group by and then do the final merge/aggregation.
This sort uses memory mapped files, so in a way it's a kind of spooling. We could potentially
do the sort on the server-side instead and then only pull over the data as the final merge
is done. Here are some relevant JIRAs for improvements: PHOENIX-1217, PHOENIX-1006.

> Cursor support in Phoenix
> -------------------------
>
>                 Key: PHOENIX-2606
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2606
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Sudarshan Kadambi
>
> Phoenix should look to support a cursor model where the user could set the fetch size
to limit the number of rows that are fetched in each batch. Each batch of result rows would
be accompanied by a flag indicating if there are more rows to be fetched for a given query
or not. 
> The state management for the cursor could be done in the client side or server side (i.e.
HBase, not the Query Server). The client side state management could involve capturing the
last key in the batch and using that as the start key for the subsequent scan operation. The
downside of this model is that if there were any intervening inserts or deletes in the result
set of the query, backtracking on the cursor would reflect these additional rows (consider
a page down, followed by a page up showing a different set of result rows). Similarly, if
the cursor is defined over the results of a join or an aggregation, these operations would
need to be performed again when the next batch of result rows are to be fetched. 
> So an alternate approach could be to manage the state server side, wherein there is a
query context area in the Regionservers (or, maybe just a temporary table) and the cursor
results are fetched from there. This ensures that the cursor has snapshot isolation semantics.
I think both models make sense but it might make sense to start with the state management
completely on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message