spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dai, Kevin" <yun...@ebay.com>
Subject Implement customized Join for SparkSQL
Date Mon, 05 Jan 2015 10:28:52 GMT
Hi, All

Suppose I want to join two tables A and B as follows:

Select * from A join B on A.id = B.id

A is a file while B is a database which indexed by id and I wrapped it by Data source API.
The desired join flow is:

1.       Generate A's RDD[Row]

2.       Generate B's RDD[Row] from A by using A's id and B's data source api to get row from
the database

3.       Merge these two RDDs to the final RDD[Row]

However it seems existing join strategy doesn't support it?

Any way to achieve it?

Best Regards,
Kevin.

Mime
View raw message