drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
Date Sun, 26 Mar 2017 05:46:43 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942158#comment-15942158
] 

ASF GitHub Bot commented on DRILL-5089:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/795#discussion_r108051857
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
---
    @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) {
         }
       }
     
    +
    +  public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider
provider,
    +                                            final String storage) {
    +    final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName();
    +    final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build();
    +    final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false);
    +    Set<String> storageSet = Sets.newHashSet();
    +    storageSet.add(storage);
    +    addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
    +    schemaTreesToClose.add(rootSchema);
    +    return rootSchema;
    +  }
    +
    +  public SchemaPlus addPartialRootSchema(final String userName, final SchemaConfigInfoProvider
provider,
    +                                            Set<String> storages, SchemaPlus rootSchema)
{
    +    final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName();
    +    final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build();
    +    addNewStoragesToRootSchema(schemaConfig, rootSchema, storages);
    +    schemaTreesToClose.add(rootSchema);
    +    return rootSchema;
    +  }
    +
    +  private void expandSecondLevelSchema(SchemaPlus parent) {
    --- End diff --
    
    Maybe explain this a bit? Why are we expanding second-level schemas for *all* top-level
schemas? Can't we do the expansion on the fly as we resolve? That is, if a query has a path
"a.b.c.d", can't we just resolve a, then within a, resolve b, and so on until we get to d?
Else, we are still open to a performance hit if, say, a is a directory of a million files,
or a database with 10K tables.


> Skip initializing all enabled storage plugins for every query
> -------------------------------------------------------------
>
>                 Key: DRILL-5089
>                 URL: https://issues.apache.org/jira/browse/DRILL-5089
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 1.9.0
>            Reporter: Abhishek Girish
>            Assignee: Chunhui Shi
>            Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage plugin,
while building the schema tree. This is done regardless of the actual plugins involved within
a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - either due to
misconfiguration or the underlying datasource being slow or being down, the overall query
time taken increases drastically. Most likely due the attempt being made to register schemas
from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one point the underlying
SQL Server db goes down, any Drill query starting to execute at that point and beyond begin
to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message