drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathaniel Auvil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3897) Partitions not being pruned
Date Tue, 06 Oct 2015 13:15:27 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945007#comment-14945007
] 

Nathaniel Auvil commented on DRILL-3897:
----------------------------------------

could be a duplicate of: https://issues.apache.org/jira/plugins/servlet/mobile#issue/DRILL-2517

> Partitions not being pruned
> ---------------------------
>
>                 Key: DRILL-3897
>                 URL: https://issues.apache.org/jira/browse/DRILL-3897
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Nathaniel Auvil
>
> have a two deep partitioning structure.  Drill is not pruning partitions correctly as
it reads all files under every directory.  My source files are tab delimited files.  
> My query:
> select dir0 server, dir1 dayId,  max(LENGTH(columns[2])) maxSize from dfs.`/archive/psn`
where dir1 >= 20151001 group by dir0,dir1 order by maxSize
> plan snippet showing Drill reading uneeded files:
> 00-00    Screen : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount
= 5.1772689218999994E8, cumulative cost = {4.898214127009591E11 rows, 3.373451719812133E12
cpu, 0.0 io, 9.966863946928127E13 network, 1.51590434033232E12 memory}, id = 44973
> 00-01      Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.897696400117401E11
rows, 3.3733999471229136E12 cpu, 0.0 io, 9.966863946928127E13 network, 1.51590434033232E12
memory}, id = 44972
> 00-02        SingleMergeExchange(sort0=[2 ASC]) : rowType = RecordType(ANY server, ANY
dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.897696400117401E11
rows, 3.3733999471229136E12 cpu, 0.0 io, 9.966863946928127E13 network, 1.51590434033232E12
memory}, id = 44971
> 01-01          SelectionVectorRemover : rowType = RecordType(ANY server, ANY dayId, ANY
maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.892519131195501E11 rows, 3.3589035941415938E12
cpu, 0.0 io, 9.330681141805055E13 network, 1.51590434033232E12 memory}, id = 44970
> 01-02            Sort(sort0=[$2], dir0=[ASC]) : rowType = RecordType(ANY server, ANY
dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.887341862273601E11
rows, 3.358385867249404E12 cpu, 0.0 io, 9.330681141805055E13 network, 1.51590434033232E12
memory}, id = 44969
> 01-03              Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.882164593351701E11
rows, 3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13 network, 1.50347889491976E12
memory}, id = 44968
> 01-04                HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY server,
ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689218999994E8, cumulative
cost = {4.882164593351701E11 rows, 3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13
network, 1.50347889491976E12 memory}, id = 44967
> 02-01                  UnorderedMuxExchange : rowType = RecordType(ANY server, ANY dayId,
ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689218999994E8, cumulative cost
= {4.876987324429801E11 rows, 3.2901543998674497E12 cpu, 0.0 io, 8.48243740164096E13 network,
1.50347889491976E12 memory}, id = 44966
> 03-01                    Project(server=[$0], dayId=[$1], maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($2))])
: rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 5.1772689218999994E8, cumulative cost = {4.871810055507901E11 rows, 3.28963667297526E12
cpu, 0.0 io, 8.48243740164096E13 network, 1.50347889491976E12 memory}, id = 44965
> 03-02                      HashAgg(group=[{0, 1}], maxSize=[MAX($2)]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.866632786586001E11
rows, 3.2875657654065E12 cpu, 0.0 io, 8.48243740164096E13 network, 1.50347889491976E12 memory},
id = 44964
> 03-03                        Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType
= RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689219E9, cumulative cost
= {4.814860097367001E11 rows, 3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13 network,
1.3667989953816E12 memory}, id = 44963
> 03-04                          HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType
= RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount
= 5.1772689219E9, cumulative cost = {4.814860097367001E11 rows, 3.1426022355933E12 cpu, 0.0
io, 8.48243740164096E13 network, 1.3667989953816E12 memory}, id = 44962
> 04-01                            UnorderedMuxExchange : rowType = RecordType(ANY server,
ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative
cost = {4.7630874081480005E11 rows, 3.0804750085305E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12
memory}, id = 44961
> 05-01                              Project(server=[$0], dayId=[$1], maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1,
hash64AsDouble($0)))]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 5.1772689219E9, cumulative cost = {4.711314718929E11 rows, 3.0752977396086E12 cpu,
0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 44960
> 05-02                                HashAgg(group=[{0, 1}], maxSize=[MAX($2)]) : rowType
= RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689219E9, cumulative cost
= {4.65954202971E11 rows, 3.054588663921E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory},
id = 44959
> 05-03                                  Project(server=[$0], dayId=[$1], $f2=[LENGTH($2)])
: rowType = RecordType(ANY server, ANY dayId, ANY $f2): rowcount = 5.1772689219E10, cumulative
cost = {4.14181513752E11 rows, 1.604953365789E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
= 44958
> 05-04                                    SelectionVectorRemover : rowType = RecordType(ANY
dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10, cumulative cost = {3.62408824533E11
rows, 1.397862608913E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44957
> 05-05                                      Filter(condition=[>=($1, 20151001)]) :
rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10, cumulative
cost = {3.10636135314E11 rows, 1.346089919694E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
= 44956
> 05-06                                        Project(dir0=[$0], dir1=[$2], ITEM=[ITEM($1,
2)]) : rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 1.03545378438E11, cumulative
cost = {2.07090756876E11 rows, 7.24817649066E11 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
= 44955
> 05-07                                          Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/archive/psn,
numFiles=116213, columns=[`dir0`, `dir1`, `columns`[2]], files=[maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-04.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-17.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-23.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-14.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-20.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-10.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-01.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.15.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-21.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-11.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-08.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-05.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-16.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-19.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.15.sink, ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message