hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ádám Szita (Jira) <j...@apache.org>
Subject [jira] [Updated] (HIVE-22705) LLAP cache is polluted by query-based compactor
Date Tue, 21 Jan 2020 08:39:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ádám Szita updated HIVE-22705:
------------------------------
    Attachment: HIVE-22705.2.patch

> LLAP cache is polluted by query-based compactor
> -----------------------------------------------
>
>                 Key: HIVE-22705
>                 URL: https://issues.apache.org/jira/browse/HIVE-22705
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>         Attachments: HIVE-22705.0.patch, HIVE-22705.1.patch, HIVE-22705.2.patch
>
>
> One of the steps that query-based compaction does is the verification of ACID sort order
by using the _validate_acid_sort_order_ UDF. This is a prerequisite before the actual compaction
can happen, and is done by a [query that reads the whole table content|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java#L161-L167].
> This results in the whole table content being populated into the cache. The problem is
that this content is not useful and will rather pollute the cache space, as it can never be
used again: cache content binds to files (file IDs) that obviously will be changed in this
case by compaction.
> I propose we disable LLAP caching in the session of query-based compaction's queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message