hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Shah (Jira)" <>
Subject [jira] [Commented] (HIVE-21917) COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
Date Thu, 12 Dec 2019 07:28:00 GMT


Aditya Shah commented on HIVE-21917:

[~pvary] the follow-up fix is HIVE-22625. Can you please take a look at that too. 


> COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
> ------------------------------------------------------------------------
>                 Key: HIVE-21917
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.1.0, 3.1.1
>            Reporter: Craig Condit
>            Assignee: Denys Kuzmenko
>            Priority: Major
>             Fix For: 4.0.0
>         Attachments: HIVE-21917.1.patch, HIVE-21917.2.patch, HIVE-21917.3.patch, HIVE-21917.4.patch,
HIVE-21917.5.patch, HIVE-21917.6.patch
> The Initiator thread in the metastore repeatedly loops over entries in the COMPLETED_TXN_COMPONENTS
table to determine which partitions / tables might need to be compacted. However, entries
are never removed from this table except by a completed Compactor run.
> In a cluster where most tables / partitions are write-once read-many, this results in
stale entries in this table never being cleaned up. In a small test cluster, we have observed
approximately 45k entries in this table (virtually equal to the number of partitions in the
cluster) while < 100 of these tables have delta files at all. Since most of the tables
will never get enough writes to trigger a compaction (and in fact have only ever been written
to once), the initiator thread keeps trying to evaluate them on every loop.
> On this test cluster, it takes approximately 10 minutes to loop through all the entries
and results in severe performance degradation on metastore operations. With the default run
timing of 5 minutes, the initiator basically never stops running.
> On a production cluster with 2M partitions, this would be a non-starter.
> The initiator thread should proactively remove entries from COMPLETED_TXN_COMPONENTS
when it determines that a compaction is not needed, so that they are not evaluated again on
the next loop.

This message was sent by Atlassian Jira

View raw message