hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Somani (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-15390) Orc reader unnecessarily reading stripe footers with hive.optimize.index.filter set to true
Date Thu, 08 Dec 2016 15:35:58 GMT
Abhishek Somani created HIVE-15390:
--------------------------------------

             Summary: Orc reader unnecessarily reading stripe footers with hive.optimize.index.filter
set to true
                 Key: HIVE-15390
                 URL: https://issues.apache.org/jira/browse/HIVE-15390
             Project: Hive
          Issue Type: Bug
          Components: ORC
    Affects Versions: 1.2.1
            Reporter: Abhishek Somani
            Assignee: Abhishek Somani


In a split given to a task, the task's orc reader is unnecessarily reading stripe footers
for stripes that are not its responsibility to read. This is happening with hive.optimize.index.filter
set to true.

Assuming one split per task(no tez grouping considered), a task should not need to read beyond
the split's end offset. Even in some split computation strategies where a split's end offset
can be in the middle of a stripe, it should not need to read more than one stripe beyond the
split's end offset(to fully read a stripe that started in it). However I see that some tasks
make unnecessary filesystem calls to read all the stripe footers in a file from the split
start offset till the end of the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message