drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudheesh Katkam (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3921) Hive LIMIT 1 queries takes too long
Date Mon, 12 Oct 2015 16:28:05 GMT
Sudheesh Katkam created DRILL-3921:
--------------------------------------

             Summary: Hive LIMIT 1 queries takes too long
                 Key: DRILL-3921
                 URL: https://issues.apache.org/jira/browse/DRILL-3921
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
            Reporter: Sudheesh Katkam
            Assignee: Sudheesh Katkam


Fragment initialization on a Hive table (that is backed by a directory of many files) can
take really long. This is evident through LIMIT 1 queries. The root cause is that the underlying
reader in the HiveRecordReader is initialized when the ctor is called, rather than when setup
is called.

Two changes need to be made:
1) lazily initialize the underlying record reader in HiveRecordReader
2) allow for running a callable as a proxy user within an operator (through OperatorContext).
This is required as initialization of the underlying record reader needs to be done as a proxy
user (proxy for owner of the file). Previously, this was handled while creating the record
batch tree.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message