hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Sherman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17868) Make queries in spark_local_queries.q have deterministic output
Date Fri, 20 Oct 2017 22:22:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213374#comment-16213374
] 

Andrew Sherman commented on HIVE-17868:
---------------------------------------

Thanks [~xuefuz] for the suggestion.  I looked at  --SORT_QUERY_RESULTS and it seems it sorts
the output after the query has run. 
So with a query like
{noformat}
select key, count(*) from src group by key limit 10
{noformat}
--SORT_QUERY_RESULTS will sort the output, but the results of the query are not sorted before
the limit is applied, so this is not enough to make the query deterministic. But using 
{noformat}
select key, count(*) from src group by key order by key limit 10
{noformat}
is, I think deterministic.

The queries in spark_local_queries.q are very small and adding the 'order by' does not seem
to make a significant difference to elapsed time.


> Make queries in spark_local_queries.q have deterministic output
> ---------------------------------------------------------------
>
>                 Key: HIVE-17868
>                 URL: https://issues.apache.org/jira/browse/HIVE-17868
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Andrew Sherman
>            Assignee: Andrew Sherman
>
> Add 'order by' to queries so that output is always the same



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message