drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathaniel Auvil (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3897) Partitions not being pruned
Date Tue, 06 Oct 2015 13:15:26 GMT
Nathaniel Auvil created DRILL-3897:
--------------------------------------

             Summary: Partitions not being pruned
                 Key: DRILL-3897
                 URL: https://issues.apache.org/jira/browse/DRILL-3897
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Nathaniel Auvil


have a two deep partitioning structure.  Drill is not pruning partitions correctly as it reads
all files under every directory.  My source files are tab delimited files.  

My query:
select dir0 server, dir1 dayId,  max(LENGTH(columns[2])) maxSize from dfs.`/archive/psn` where
dir1 >= 20151001 group by dir0,dir1 order by maxSize

plan snippet showing Drill reading uneeded files:


00-00    Screen : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8,
cumulative cost = {4.898214127009591E11 rows, 3.373451719812133E12 cpu, 0.0 io, 9.966863946928127E13
network, 1.51590434033232E12 memory}, id = 44973
00-01      Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY server,
ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.897696400117401E11
rows, 3.3733999471229136E12 cpu, 0.0 io, 9.966863946928127E13 network, 1.51590434033232E12
memory}, id = 44972
00-02        SingleMergeExchange(sort0=[2 ASC]) : rowType = RecordType(ANY server, ANY dayId,
ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.897696400117401E11 rows,
3.3733999471229136E12 cpu, 0.0 io, 9.966863946928127E13 network, 1.51590434033232E12 memory},
id = 44971
01-01          SelectionVectorRemover : rowType = RecordType(ANY server, ANY dayId, ANY maxSize):
rowcount = 5.1772689218999994E8, cumulative cost = {4.892519131195501E11 rows, 3.3589035941415938E12
cpu, 0.0 io, 9.330681141805055E13 network, 1.51590434033232E12 memory}, id = 44970
01-02            Sort(sort0=[$2], dir0=[ASC]) : rowType = RecordType(ANY server, ANY dayId,
ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.887341862273601E11 rows,
3.358385867249404E12 cpu, 0.0 io, 9.330681141805055E13 network, 1.51590434033232E12 memory},
id = 44969
01-03              Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.882164593351701E11
rows, 3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13 network, 1.50347889491976E12
memory}, id = 44968
01-04                HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY server,
ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689218999994E8, cumulative
cost = {4.882164593351701E11 rows, 3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13
network, 1.50347889491976E12 memory}, id = 44967
02-01                  UnorderedMuxExchange : rowType = RecordType(ANY server, ANY dayId,
ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689218999994E8, cumulative cost
= {4.876987324429801E11 rows, 3.2901543998674497E12 cpu, 0.0 io, 8.48243740164096E13 network,
1.50347889491976E12 memory}, id = 44966
03-01                    Project(server=[$0], dayId=[$1], maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($2))])
: rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 5.1772689218999994E8, cumulative cost = {4.871810055507901E11 rows, 3.28963667297526E12
cpu, 0.0 io, 8.48243740164096E13 network, 1.50347889491976E12 memory}, id = 44965
03-02                      HashAgg(group=[{0, 1}], maxSize=[MAX($2)]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.866632786586001E11
rows, 3.2875657654065E12 cpu, 0.0 io, 8.48243740164096E13 network, 1.50347889491976E12 memory},
id = 44964
03-03                        Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689219E9, cumulative cost = {4.814860097367001E11
rows, 3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13 network, 1.3667989953816E12 memory},
id = 44963
03-04                          HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType
= RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount
= 5.1772689219E9, cumulative cost = {4.814860097367001E11 rows, 3.1426022355933E12 cpu, 0.0
io, 8.48243740164096E13 network, 1.3667989953816E12 memory}, id = 44962
04-01                            UnorderedMuxExchange : rowType = RecordType(ANY server, ANY
dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative
cost = {4.7630874081480005E11 rows, 3.0804750085305E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12
memory}, id = 44961
05-01                              Project(server=[$0], dayId=[$1], maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1,
hash64AsDouble($0)))]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 5.1772689219E9, cumulative cost = {4.711314718929E11 rows, 3.0752977396086E12 cpu,
0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 44960
05-02                                HashAgg(group=[{0, 1}], maxSize=[MAX($2)]) : rowType
= RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689219E9, cumulative cost
= {4.65954202971E11 rows, 3.054588663921E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory},
id = 44959
05-03                                  Project(server=[$0], dayId=[$1], $f2=[LENGTH($2)])
: rowType = RecordType(ANY server, ANY dayId, ANY $f2): rowcount = 5.1772689219E10, cumulative
cost = {4.14181513752E11 rows, 1.604953365789E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
= 44958
05-04                                    SelectionVectorRemover : rowType = RecordType(ANY
dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10, cumulative cost = {3.62408824533E11
rows, 1.397862608913E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44957
05-05                                      Filter(condition=[>=($1, 20151001)]) : rowType
= RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10, cumulative cost =
{3.10636135314E11 rows, 1.346089919694E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44956
05-06                                        Project(dir0=[$0], dir1=[$2], ITEM=[ITEM($1,
2)]) : rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 1.03545378438E11, cumulative
cost = {2.07090756876E11 rows, 7.24817649066E11 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
= 44955
05-07                                          Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/archive/psn,
numFiles=116213, columns=[`dir0`, `dir1`, `columns`[2]], files=[maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-04.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-17.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-23.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-14.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-20.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-10.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-01.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.15.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-21.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-11.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-08.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-05.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-16.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-19.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.15.sink, ...






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message