spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gsvic <>
Subject RE: ShuffledHashJoin Possible Issue
Date Mon, 19 Oct 2015 09:59:51 GMT
Hi Hao,

Each table is created with the following python code snippet:

data = [{'id': 'A%d'%i, 'value':ceil(random()*10)} for i in range(0,50)]
with open('A.json', 'w+') as output:
    json.dump(data, output)

The tables A and B containing 10 and 50 tuples respectively. 

In spark shell I type

sqlContext.setConf("spark.sql.planner.sortMergeJoin", "false") to disable
sortMergeJoin and
sqlContext.setConf("spark.sql.autoBroadcastJoinThreshold", "0") to disable
BroadcastHashJoin, cause the tables are too small and this join will be

Finally I run the following query:

and the result I get equals to zero, while ShuffledHashJoin and
SortMergeJoin returns the right result (10).

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message