drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2553) Cost calculation fails to properly choose single file scan in favor of a multi-file scan when files are small
Date Wed, 25 Mar 2015 01:49:52 GMT
Jason Altekruse created DRILL-2553:
--------------------------------------

             Summary: Cost calculation fails to properly choose single file scan in favor
of a multi-file scan when files are small
                 Key: DRILL-2553
                 URL: https://issues.apache.org/jira/browse/DRILL-2553
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 0.8.0
            Reporter: Jason Altekruse
            Assignee: Aman Sinha


There is a failing test case in the patch for constant folding that should be checked in soon.
The test attempts to prune out one directory of a scan after a constant expression returning
the name of a directory is folded, but the files being read from both directories are very
small. Our current method of calculating cost makes the pruned and unpruned plans report the
same cost. This could be fixed in a few different locations, EasyGroupScan.getScanStats()
being used here could factor the file count into its calculation of the total row count. We
also could move to a two part metric to track the number of files, instead of just an estimated
row count. This would require some changes in the cost calculation of the scan rels themselves
which use the information from the scan stats. I think in general we should consider solving
this as high up as possible, as we want to make as optimal cost estimates as possible, even
if the information provided from storage plugins is not completely accurate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message