hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21815) Stats in ORC file are parsed twice
Date Fri, 07 Jun 2019 05:08:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858313#comment-16858313
] 

Gopal V commented on HIVE-21815:
--------------------------------

bq. Changing the code as mentioned in the previous comment fixes the issue.

I think that would be the best fix - please attach a patch and get it through tests.

> Stats in ORC file are parsed twice
> ----------------------------------
>
>                 Key: HIVE-21815
>                 URL: https://issues.apache.org/jira/browse/HIVE-21815
>             Project: Hive
>          Issue Type: Improvement
>          Components: ORC
>            Reporter: Gopal V
>            Assignee: Krisztian Kasa
>            Priority: Major
>         Attachments: orc-tail-getproto.png, tez-am-2x-protobuf.svg
>
>
> ORC record reader unnecessarily parses stats twice
> {code}
>       if (orcTail == null) {
>         Reader orcReader = OrcFile.createReader(file.getPath(),
>             OrcFile.readerOptions(context.conf)
>                 .filesystem(fs)
>                 .maxLength(AcidUtils.getLogicalLength(fs, file)));
>         orcTail = new OrcTail(orcReader.getFileTail(), orcReader.getSerializedFileFooter(),
>             file.getModificationTime());
>         if (context.cacheStripeDetails) {
>           context.footerCache.put(new FooterCacheKey(fsFileId, file.getPath()), orcTail);
>         }
>       }
>       stripes = orcTail.getStripes();
>       stripeStats = orcTail.getStripeStatistics();
> {code}
> We go from Reader -> OrcTail -> StripeStatistics.
> stripeStats is read out of the orcTail and is already read inside orcReader.getStripeStatistics().
> !orc-tail-getproto.png!
>  [^tez-am-2x-protobuf.svg] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message