drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3921) Hive LIMIT 1 queries take too long
Date Tue, 13 Oct 2015 00:32:05 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954143#comment-14954143

ASF GitHub Bot commented on DRILL-3921:

Github user jacques-n commented on a diff in the pull request:

    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java
    @@ -223,6 +228,24 @@ private void init() throws ExecutionSetupException {
       public void setup(@SuppressWarnings("unused") OperatorContext context, OutputMutator
           throws ExecutionSetupException {
    +    final ListenableFuture<Void> result = context.runCallableAs(proxyUgi,
    +      new Callable<Void>() {
    +        @Override
    +        public Void call() throws Exception {
    +          init();
    +          return null;
    +        }
    +      });
    +    try {
    +      result.get();
    +    } catch (InterruptedException e) {
    +      result.cancel(true);
    +      // Preserve evidence that the interruption occurred so that code higher up on the
call stack can learn of the
    +      // interruption and respond to it if it wants to.
    +      Thread.currentThread().interrupt();
    --- End diff --
    I don't think you want to set the interrupt bit on the proxy thread, you probably need
throw the interrupted exception in the original context.  I also suggest creating a little
inner class rather than an anonymous object for easier understanding.

> Hive LIMIT 1 queries take too long
> ----------------------------------
>                 Key: DRILL-3921
>                 URL: https://issues.apache.org/jira/browse/DRILL-3921
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>            Reporter: Sudheesh Katkam
>            Assignee: Sudheesh Katkam
> Fragment initialization on a Hive table (that is backed by a directory of many files)
can take really long. This is evident through LIMIT 1 queries. The root cause is that the
underlying reader in the HiveRecordReader is initialized when the ctor is called, rather than
when setup is called.
> Two changes need to be made:
> 1) lazily initialize the underlying record reader in HiveRecordReader
> 2) allow for running a callable as a proxy user within an operator (through OperatorContext).
This is required as initialization of the underlying record reader needs to be done as a proxy
user (proxy for owner of the file). Previously, this was handled while creating the record
batch tree.

This message was sent by Atlassian JIRA

View raw message