hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2988) Use of XMLEncoder to serialize MapredWork causes OOM in hive cli
Date Mon, 30 Apr 2012 18:15:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265073#comment-13265073
] 

Rohini Palaniswamy commented on HIVE-2988:
------------------------------------------

I ran with 128M to investigate the OOM. We have resorted to running with 1G as XmX because
we keep hitting OOM with bigger tables in hive. There were other things that contributed to
the memory usage - mostly Path objects  because of the higher number of partitions. But they
are absolutely needed. XMLEncoder is something that created too much garbage in a very short
span and caused GC. That would be something easy to change/fix without having to touch the
core logic. 

 We should be looking at fixing the root cause of the problem instead of keeping on increasing
the memory requirements. Ours is a highly multi-tenant system and there are lot of other programs(pig,etc)
running too in the gateway. So running with a lower memory(256-512MB) will help. 

Found two other reports of this issue:
  http://mail-archives.apache.org/mod_mbox/hive-user/201106.mbox/%3CBANLkTik4THLNkxV87UygvqhoLri3UL9R3Q@mail.gmail.com%3E

https://issues.apache.org/jira/browse/HIVE-1316
   - This fix increased the max heap size of CLI client and disabled GC overhead limit.
                
> Use of XMLEncoder to serialize MapredWork causes OOM in hive cli
> ----------------------------------------------------------------
>
>                 Key: HIVE-2988
>                 URL: https://issues.apache.org/jira/browse/HIVE-2988
>             Project: Hive
>          Issue Type: Improvement
>          Components: CLI
>            Reporter: Rohini Palaniswamy
>              Labels: Performance
>
> When running queries on tables with 6000 partitions, hive cli if configured with 128M
runs into OOM. Heapdump showed 37MB occupied by one XMLEncoder object while the MapredWork
was 500K which is highly inefficient. We should switch to using something more efficient like
XStream. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message