hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sankar Sivarama Subramaniyan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12084) Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java heap space
Date Wed, 21 Oct 2015 20:15:27 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Sankar Sivarama Subramaniyan updated HIVE-12084:
-----------------------------------------------------
    Attachment: HIVE-12084.4.patch

[~jpullokkaran] Can you please review patch#4.

> Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java heap space
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-12084
>                 URL: https://issues.apache.org/jira/browse/HIVE-12084
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-12084.1.patch, HIVE-12084.2.patch, HIVE-12084.3.patch, HIVE-12084.4.patch
>
>
> STEPS TO REPRODUCE:
> {code}
> CREATE TABLE `sample_07` ( `code` string , `description` string , `total_emp` int , `salary`
int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TextFile;
> load data local inpath 'sample_07.csv'  into table sample_07;
> set hive.limit.pushdown.memory.usage=0.9999;
> select * from sample_07 order by salary LIMIT 999999999;
> {code}
> This will result in 
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.hadoop.hive.ql.exec.TopNHash.initialize(TopNHash.java:113)
> 	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initializeOp(ReduceSinkOperator.java:234)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:68)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
> {code}
> The basic issue lies with top n optimization. We need a limit for the top n optimization.
Ideally we would detect that the allocated bytes will be bigger than the "limit.pushdown.memory.usage"
without trying to alloc it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message