hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikram Dixit K (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6262) Remove unnecessary copies of schema + table desc from serialized plan
Date Mon, 27 Jan 2014 21:55:38 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883389#comment-13883389
] 

Vikram Dixit K commented on HIVE-6262:
--------------------------------------

This is really good in terms of memory efficiency. LGTM +1.

> Remove unnecessary copies of schema + table desc from serialized plan
> ---------------------------------------------------------------------
>
>                 Key: HIVE-6262
>                 URL: https://issues.apache.org/jira/browse/HIVE-6262
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Gunther Hagleitner
>         Attachments: HIVE-6262.1.patch
>
>
> Currently for a partitioned table the following are true:
> - for each partitiondesc we send a copy of the corresponding tabledesc
> - for each partitiondesc we send two copies of the schema (in different formats).
> Obviously we need to send different schemas if they are required by schema evolution,
but in our case we'll always end up with multiple copies.
> The effect can be dramatic. The reductions by removing those on partitioned tables easily
be can be 8-10x in size. Plans themselves can be 10s to 100s of mb (even with kryo). The size
difference also plays out in every task on the cluster we run.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message