spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Gopalakrishnan <dgk...@gmail.com>
Subject Mapper side join with DataFrames API
Date Mon, 29 Feb 2016 15:45:28 GMT
Hello,

I'm trying to join 2 dataframes A and B with a

sqlContext.sql("SELECT * FROM A INNER JOIN B ON A.a=B.a");

Now what I have done is that I have registeredTempTables for A and B after
loading these DataFrames from different sources. I need the join to be
really fast and I was wondering if there is a way to use the SQL statement
and then being able to do a mapper side join ( say my table B is small) ?

I read some articles on using broadcast to do mapper side joins. Could I do
something like this and then execute my sql statement to achieve mapper
side join ?

DataFrame B = sparkContext.broadcast(B);
B.registerTempTable("B");



-- 
Regards,
*Deepak Gopalakrishnan*
*Mobile*:+918891509774
*Skype* : deepakgk87
http://myexps.blogspot.com

Mime
View raw message