1) if you are using thrift server any cached tables would be cached for all sessions (I am not sure if this was your question)
2) If you want to ensure that the smaller table in the join is replicated to all nodes, you can do the following
left.join(broadcast(right), "joinKey")
look at this https://issues.apache.org/jira/browse/SPARK-8300


On 26 October 2015 at 20:43, Jags Ramnarayanan <jramnarayan@pivotal.io> wrote:
If you are using Spark SQL and joining two dataFrames the optimizer would automatically broadcast the smaller table (You can configure the size if the default is too small). 

Else, in code, you can collect any RDD to the driver and broadcast using the context.broadcast method. 

-- Jags

On Mon, Oct 26, 2015 at 11:17 AM, Younes Naguib <Younes.Naguib@tritondigital.com> wrote:

Hi all,


I use the thrift server, and I cache a table using “cache table mytab”.

Is there any sql to broadcast it too?



Younes Naguib

Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC  H3G 1R8

Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.naguib@tritondigital.com