hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
Date Tue, 22 Oct 2013 12:00:47 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801748#comment-13801748
] 

Hudson commented on HIVE-4957:
------------------------------

FAILURE: Integrated in Hive-trunk-h0.21 #2413 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2413/])
HIVE-4957 - Restrict number of bit vectors, to prevent out of Java heap memory (Shreepadma
Venugopalan via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534337)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
* /hive/trunk/ql/src/test/queries/clientnegative/compute_stats_long.q
* /hive/trunk/ql/src/test/results/clientnegative/compute_stats_long.q.out


> Restrict number of bit vectors, to prevent out of Java heap memory
> ------------------------------------------------------------------
>
>                 Key: HIVE-4957
>                 URL: https://issues.apache.org/jira/browse/HIVE-4957
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.11.0
>            Reporter: Brock Noland
>            Assignee: Shreepadma Venugopalan
>             Fix For: 0.13.0
>
>         Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch
>
>
> normally increase number of bit vectors will increase calculation accuracy. Let's say
> {noformat}
> select compute_stats(a, 40) from test_hive;
> {noformat}
> generally get better accuracy than
> {noformat}
> select compute_stats(a, 16) from test_hive;
> {noformat}
> But larger number of bit vectors also cause query run slower. When number of bit vectors
over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and
crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number
of bit vectors in 'compute_stats' query.
> One example
> {noformat}
> select compute_stats(a, 999999999) from column_eight_types;
> {noformat}
> crashes Hive.
> {noformat}
> 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
> 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 sec
> MapReduce Total cumulative CPU time: 290 msec
> Ended Job = job_1354923204155_0777 with errors
> Error during job, obtaining debugging information...
> Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
> Examining task ID: task_1354923204155_0777_m_000000 (and more) from job job_1354923204155_0777
> Task with the most failures(4): 
> -----
> Task ID:
>   task_1354923204155_0777_m_000000
> URL:
>   http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777&tipid=task_1354923204155_0777_m_000000
> -----
> Diagnostic Messages for this Task:
> Error: Java heap space
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message