hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ádám Szita (Jira) <>
Subject [jira] [Commented] (HIVE-22583) LLAP cache always misses with non-vectorized serde readers such as OpenCSV
Date Thu, 12 Dec 2019 16:08:00 GMT


Ádám Szita commented on HIVE-22583:

Thanks for looking into this, [~bslim].

I was thinking that maybe we could do an {{llap cache -purge}} at the end of the test case.
That would imprint the number bytes that were cached in the q test result file. I guess it
would not interfere with other test cases, as ptest runs them sequentially within a batch.

What's your opinion?

> LLAP cache always misses with non-vectorized serde readers such as OpenCSV
> --------------------------------------------------------------------------
>                 Key: HIVE-22583
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>         Attachments: HIVE-22583.0.patch, HIVE-22583.1.patch, HIVE-22583.2.patch
> Although after the first read LLAP cache stores data of tables that are not using the
LazySimple serde, the stored data is then never used in the future subsequent queries, causing
a full cache miss and re-read each time.
> Problem is rooted in SerdeEncodedDataReader#cacheFileData is not taking care of creating
an entry for the root/struct column of the table. The only cases this is taken care of are
when a vectorized reader is used _(e.g. LazySimpleSerde's LazySimpleDeserializeRead)_, where
SerdeEncodedDataReader#processAsyncCacheData takes care of this.
> This can be reproduced by either using a custom serde, like OpenCSV or using LazySimpleSerde,
but turning off

This message was sent by Atlassian Jira

View raw message