hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <>
Subject [jira] [Commented] (HIVE-4421) Improve memory usage by ORC dictionaries
Date Thu, 02 May 2013 05:24:17 GMT


Phabricator commented on HIVE-4421:

omalley has commented on the revision "HIVE-4421 [jira] Improve memory usage by ORC dictionaries".

  Ashutosh, I incorporated most of your input. The 5000 rows between memory checks is just
how often we check the writers against the size of their allocation. If there is enough memory,
it doesn't result in any IO. I don't think there would be enough use to justify making it
into a HiveConf variable.

  You asked why I removed the countOutput and the answer is that we didn't have immediate
plans to use it, the use case for it was relatively rare and it saved some memory & complexity.


To: JIRA, ashutoshc, omalley

> Improve memory usage by ORC dictionaries
> ----------------------------------------
>                 Key: HIVE-4421
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.11.0
>         Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, HIVE-4421.D10545.3.patch,
> Currently, for tables with many string columns, it is possible to significantly underestimate
the memory used by the ORC dictionaries and cause the query to run out of memory in the task.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message