drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lizhenmxcz@163.com" <lizhenm...@163.com>
Subject join fail
Date Tue, 10 May 2016 01:53:54 GMT
 
hi:
i run join operation in the drill, i use broadcast and put the small table in the right. The
small table has 32000000 rows. I have set the planner.broadcast_threshold to 100000000. The
cluster has three nodes and every node has 64G memory. when join is running, the memory is
increasing untill the driilbit process exit. But the same query is run successful in the impala
and they are in the same cluster.
here is the query plan.
 
00-00    Screen : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank):
rowcount = 1.0, cumulative cost = {5.294005617599999E8 rows, 6.356723058846001E10 cpu, 0.0
io, 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5015
00-01      Project(sourceIP=[$0], totalRevenue=[$1], avgPageRank=[$2]) : rowType = RecordType(VARCHAR(65535)
sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 1.0, cumulative cost = {5.2940056165999985E8
rows, 6.356723058836001E10 cpu, 0.0 io, 1.4803953770495996E11 network, 9.1066982688E8 memory},
id = 5014
00-02        SelectionVectorRemover : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE
totalRevenue, ANY avgPageRank): rowcount = 1.0, cumulative cost = {5.2940056165999985E8 rows,
6.356723058836001E10 cpu, 0.0 io, 1.4803953770495996E11 network, 9.1066982688E8 memory}, id
= 5013
00-03          Limit(fetch=[1]) : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue,
ANY avgPageRank): rowcount = 1.0, cumulative cost = {5.2940056065999985E8 rows, 6.356723058736001E10
cpu, 0.0 io, 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5012
00-04            SingleMergeExchange(sort0=[1 DESC]) : rowType = RecordType(VARCHAR(65535)
sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 457983.77999999997, cumulative
cost = {5.2940055965999985E8 rows, 6.356723058336001E10 cpu, 0.0 io, 1.4803953770495996E11
network, 9.1066982688E8 memory}, id = 5011
01-01              SelectionVectorRemover : rowType = RecordType(VARCHAR(65535) sourceIP,
DOUBLE totalRevenue, ANY avgPageRank): rowcount = 457983.77999999997, cumulative cost = {5.289425758799999E8
rows, 6.356173477800001E10 cpu, 0.0 io, 1.4241183301631998E11 network, 9.1066982688E8 memory},
id = 5010
01-02                TopN(limit=[1]) : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE
totalRevenue, ANY avgPageRank): rowcount = 457983.77999999997, cumulative cost = {5.284845920999999E8
rows, 6.356127679422001E10 cpu, 0.0 io, 1.4241183301631998E11 network, 9.1066982688E8 memory},
id = 5009
01-03                  Project(sourceIP=[$0], totalRevenue=[$1], avgPageRank=[$2]) : rowType
= RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 457983.77999999997,
cumulative cost = {5.2802660831999993E8 rows, 6.356127679422001E10 cpu, 0.0 io, 1.4241183301631998E11
network, 9.1066982688E8 memory}, id = 5008
01-04                    HashToRandomExchange(dist0=[[$1]]) : rowType = RecordType(VARCHAR(65535)
sourceIP, DOUBLE totalRevenue, ANY avgPageRank, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount =
457983.77999999997, cumulative cost = {5.2802660831999993E8 rows, 6.356127679422001E10 cpu,
0.0 io, 1.4241183301631998E11 network, 9.1066982688E8 memory}, id = 5007
02-01                      UnorderedMuxExchange : rowType = RecordType(VARCHAR(65535) sourceIP,
DOUBLE totalRevenue, ANY avgPageRank, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.77999999997,
cumulative cost = {5.2756862453999996E8 rows, 6.3553949053740005E10 cpu, 0.0 io, 1.3490822676479999E11
network, 9.1066982688E8 memory}, id = 5006
03-01                        Project(sourceIP=[$0], totalRevenue=[$1], avgPageRank=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)])
: rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank, ANY
E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.77999999997, cumulative cost = {5.2711064076E8
rows, 6.355349106996001E10 cpu, 0.0 io, 1.3490822676479999E11 network, 9.1066982688E8 memory},
id = 5005
03-02                          Project(sourceIP=[$0], totalRevenue=[CASE(=($4, 0), null, $3)],
avgPageRank=[CAST(/(CastHigh(CASE(=($2, 0), null, $1)), $2)):ANY NOT NULL]) : rowType = RecordType(VARCHAR(65535)
sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 457983.77999999997, cumulative
cost = {5.2665265698E8 rows, 6.3551659134840004E10 cpu, 0.0 io, 1.3490822676479999E11 network,
9.1066982688E8 memory}, id = 5004
03-03                            HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)],
agg#2=[$SUM0($3)], agg#3=[$SUM0($4)]) : rowType = RecordType(VARCHAR(65535) sourceIP, INTEGER
$f1, BIGINT $f2, DOUBLE $f3, BIGINT $f4): rowcount = 457983.77999999997, cumulative cost =
{5.2619467320000005E8 rows, 6.3547995264600006E10 cpu, 0.0 io, 1.3490822676479999E11 network,
9.1066982688E8 memory}, id = 5003
03-04                              Project(sourceIP=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4])
: rowType = RecordType(VARCHAR(65535) sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE $f3, BIGINT
$f4): rowcount = 4579837.8, cumulative cost = {5.2161483540000004E8 rows, 6.32915243478E10
cpu, 0.0 io, 1.3490822676479999E11 network, 8.300646816E8 memory}, id = 5002
03-05                                HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(VARCHAR(65535)
sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE $f3, BIGINT $f4, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 4579837.8, cumulative cost = {5.2161483540000004E8 rows, 6.32915243478E10 cpu,
0.0 io, 1.3490822676479999E11 network, 8.300646816E8 memory}, id = 5001
04-01                                  UnorderedMuxExchange : rowType = RecordType(VARCHAR(65535)
sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE $f3, BIGINT $f4, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 4579837.8, cumulative cost = {5.170349976E8 rows, 6.3218246943E10 cpu, 0.0 io,
2.2354132992E10 network, 8.300646816E8 memory}, id = 5000
05-01                                    Project(sourceIP=[$0], $f1=[$1], $f2=[$2], $f3=[$3],
$f4=[$4], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) : rowType = RecordType(VARCHAR(65535)
sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE $f3, BIGINT $f4, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 4579837.8, cumulative cost = {5.124551598E8 rows, 6.32136671052E10 cpu, 0.0 io,
2.2354132992E10 network, 8.300646816E8 memory}, id = 4999
05-02                                      HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)],
agg#2=[$SUM0($2)], agg#3=[COUNT($2)]) : rowType = RecordType(VARCHAR(65535) sourceIP, INTEGER
$f1, BIGINT $f2, DOUBLE $f3, BIGINT $f4): rowcount = 4579837.8, cumulative cost = {5.07875322E8
rows, 6.3195347754E10 cpu, 0.0 io, 2.2354132992E10 network, 8.300646816E8 memory}, id = 4998
05-03                                        Project(sourceIP=[$0], pagerank=[$5], adrevenue=[$3])
: rowType = RecordType(VARCHAR(65535) sourceIP, INTEGER pagerank, DOUBLE adrevenue): rowcount
= 4.5798378E7, cumulative cost = {4.62076944E8 rows, 6.0630638586E10 cpu, 0.0 io, 2.2354132992E10
network, 2.40132288E7 memory}, id = 4997
05-04                                          HashJoin(condition=[=($4, $1)], joinType=[inner])
: rowType = RecordType(VARCHAR(65535) sourceip, VARCHAR(65535) desturl, VARCHAR(65535) visitdate,
DOUBLE adrevenue, VARCHAR(65535) pageurl, INTEGER pagerank): rowcount = 4.5798378E7, cumulative
cost = {4.62076944E8 rows, 6.0630638586E10 cpu, 0.0 io, 2.2354132992E10 network, 2.40132288E7
memory}, id = 4996
05-06                                            SelectionVectorRemover : rowType = RecordType(VARCHAR(65535)
sourceip, VARCHAR(65535) desturl, VARCHAR(65535) visitdate, DOUBLE adrevenue): rowcount =
4.5798378E7, cumulative cost = {4.12185402E8 rows, 5.9950076802E10 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 4993
05-07                                              Filter(condition=[AND(>=(DATEDIFF($2,
'1980-01-01'), 0), <=(DATEDIFF($2, '1980-04-01'), 0))]) : rowType = RecordType(VARCHAR(65535)
sourceip, VARCHAR(65535) desturl, VARCHAR(65535) visitdate, DOUBLE adrevenue): rowcount =
4.5798378E7, cumulative cost = {3.66387024E8 rows, 5.9904278424E10 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 4992
05-08                                                Scan(groupscan=[HiveScan [table=Table(dbName:default,
tableName:uservisits_copy_huge), columns=[`sourceip`, `desturl`, `visitdate`, `adrevenue`],
numPartitions=0, partitions= null, inputDirectories=[hdfs://nameservice1/HiBench/huge/Join/Input/uservisits]]])
: rowType = RecordType(VARCHAR(65535) sourceip, VARCHAR(65535) desturl, VARCHAR(65535) visitdate,
DOUBLE adrevenue): rowcount = 1.83193512E8, cumulative cost = {1.83193512E8 rows, 5.862192384E10
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4991
05-05                                            BroadcastExchange : rowType = RecordType(VARCHAR(65535)
pageurl, INTEGER pagerank): rowcount = 1364388.0, cumulative cost = {2728776.0 rows, 1.20066144E8
cpu, 0.0 io, 2.2354132992E10 network, 0.0 memory}, id = 4995
06-01                                              Scan(groupscan=[HiveScan [table=Table(dbName:default,
tableName:rankings_huge), columns=[`pageurl`, `pagerank`], numPartitions=0, partitions= null,
inputDirectories=[hdfs://nameservice1/HiBench/huge/Join/Input/rankings]]]) : rowType = RecordType(VARCHAR(65535)
pageurl, INTEGER pagerank): rowcount = 1364388.0, cumulative cost = {1364388.0 rows, 1.0915104E8
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4994
--
 
thanks for your regards.
 
 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message