drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill pull request #795: DRILL-5089: Get only partial schemas of relevant st...
Date Sun, 26 Mar 2017 05:46:24 GMT
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/795#discussion_r108051857
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
---
    @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) {
         }
       }
     
    +
    +  public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider
provider,
    +                                            final String storage) {
    +    final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName();
    +    final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build();
    +    final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false);
    +    Set<String> storageSet = Sets.newHashSet();
    +    storageSet.add(storage);
    +    addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
    +    schemaTreesToClose.add(rootSchema);
    +    return rootSchema;
    +  }
    +
    +  public SchemaPlus addPartialRootSchema(final String userName, final SchemaConfigInfoProvider
provider,
    +                                            Set<String> storages, SchemaPlus rootSchema)
{
    +    final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName();
    +    final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build();
    +    addNewStoragesToRootSchema(schemaConfig, rootSchema, storages);
    +    schemaTreesToClose.add(rootSchema);
    +    return rootSchema;
    +  }
    +
    +  private void expandSecondLevelSchema(SchemaPlus parent) {
    --- End diff --
    
    Maybe explain this a bit? Why are we expanding second-level schemas for *all* top-level
schemas? Can't we do the expansion on the fly as we resolve? That is, if a query has a path
"a.b.c.d", can't we just resolve a, then within a, resolve b, and so on until we get to d?
Else, we are still open to a performance hit if, say, a is a directory of a million files,
or a database with 10K tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message