drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amansinha100 <...@git.apache.org>
Subject [GitHub] drill pull request: DRILL-3735: For partition pruning divide up th...
Date Tue, 15 Sep 2015 05:07:39 GMT
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/156#discussion_r39474948
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionDescriptor.java
---
    @@ -125,4 +117,16 @@ private String getBaseTableLocation() {
         final FormatSelection origSelection = (FormatSelection) scanRel.getDrillTable().getSelection();
         return origSelection.getSelection().selectionRoot;
       }
    +
    +  @Override
    +  protected void createPartitionSublists() {
    +    Set<String> fileLocations = ((ParquetGroupScan) scanRel.getGroupScan()).getFileSet();
    +    List<PartitionLocation> locations = new LinkedList<>();
    +    for (String file: fileLocations) {
    +      locations.add(new DFSPartitionLocation(MAX_NESTED_SUBDIRS, getBaseTableLocation(),
file));
    --- End diff --
    
    Actually, this patch was not about reducing memory footprint per se.  It was to eliminate
the 64K files limit for partition pruning.  The above function logic is the same as we had
before for getPartitions() plus the new splitting of the list into sublists.  The long filenames
seem less of an issue for the JVM heap usage. Suppose we have 100K files each with name length
200 bytes.  This is 20MB which is relatively low compared to the heap size.   However, we
should try to build a better framework for propagating the filenames throughout the planning
process.  Right now, methods such as FormatSelection.getAsFiles() populate all the filenames
as once.   Ideally, these could also expose an iterator model. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message