spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gsvic <>
Subject ShuffledHashJoin Possible Issue
Date Sun, 18 Oct 2015 19:55:04 GMT
I am doing some experiments with join algorithms in SparkSQL and I am facing
the following issue:

I have costructed two "dummy" json tables, t1.json and t2.json. Each of them
has two columns, ID and Value. The ID is an incremental integer(unique) and
the Value a random value. I am running an equi-join query on ID attribute.
In case of SortMerge and BroadcastHashJoin algorithms, the return result is
correct but in case of ShuffledHashJoin the count aggregate returns always
zero. The correct result is t2, as t2.ID is a subset of t1.ID.

The query is *t1.join(t2).where(t1("ID").equalTo(t2("ID")))*

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message