hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <>
Subject [jira] [Updated] (HIVE-8061) improve the speed of col stats update speed
Date Thu, 11 Sep 2014 21:12:33 GMT


Pengcheng Xiong updated HIVE-8061:
    Attachment: HIVE-8061.1.patch

Major improvement
(1) All the partition status update/insert is now done in one transaction.
(2) Rather than to use a query to update per col per partition (total query = #col * # part),
now we use 1 query to delete everything and then use 1 query to insert everything. The transaction
makes sure that this happens in ACID mode.

> improve the speed of col stats update speed
> -------------------------------------------
>                 Key: HIVE-8061
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>            Priority: Minor
>         Attachments: HIVE-8061.1.patch
> We worked hard towards faster update stats for columns of a partition of a table previously
> and
> Although there is some improvement, it is only correct in the first run. There will be
duplicate column stats later. Thanks to Eugene Koifman 's comments.
> We fixed this in by reversing the patch.
> This JIRA ticket is my another try to improve the speed.

This message was sent by Atlassian JIRA

View raw message