hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
Date Thu, 15 Oct 2015 10:05:05 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958607#comment-14958607
] 

Jesus Camacho Rodriguez commented on HIVE-11531:
------------------------------------------------

Awesome, thanks [~huizane]!

Sort operator for CBO (HiveSortLimit) has already support for offset and limit; most rules
involving Limit (maybe all) should have support for fetch too.

Integration with CBO would imply 1) setting the offset for the Calcite operator in SemanticAnalyzer,
2) translating the offset contained in the Calcite operator back in ASTConverter, and 3) modifying
any rule that might need to be updated to work properly with offset (if needed).

I have seen in the patch that 1) is already done. [~huizane], could you complete 2) and add
tests to offset_limit.q with CBO on to verify that it is working properly?
The problem with implementing only 1) is that we would be reading the offset from the query
and setting it in the HiveSortLimit operator, but unless 2) is completed, we would be losing
it when we translate back the Calcite operator.

FYI [~jpullokkaran]

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-11531
>                 URL: https://issues.apache.org/jira/browse/HIVE-11531
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Hui Zheng
>         Attachments: HIVE-11531.WIP.1.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the form SELECT
... LIMIT X,Y where X,Y are coordinates inside the result to be paginated (which can be extremely
large by itself). At present, ROW_NUMBER can be used to achieve this effect, but optimizations
for LIMIT such as TopN in ReduceSink do not apply to ROW_NUMBER. We can add first class support
for "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message