spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deenar Toraskar <deenar.toras...@gmail.com>
Subject Re: Broadcast table
Date Tue, 27 Oct 2015 07:28:34 GMT
1) if you are using thrift server any cached tables would be cached for all
sessions (I am not sure if this was your question)
2) If you want to ensure that the smaller table in the join is replicated
to all nodes, you can do the following

left.join(broadcast(right), "joinKey")

look at this https://issues.apache.org/jira/browse/SPARK-8300,

Deenar

On 26 October 2015 at 20:43, Jags Ramnarayanan <jramnarayan@pivotal.io>
wrote:

> If you are using Spark SQL and joining two dataFrames the optimizer would
> automatically broadcast the smaller table (You can configure the size if
> the default is too small).
>
> Else, in code, you can collect any RDD to the driver and broadcast using
> the context.broadcast method.
>
> http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf
>
> -- Jags
> (www.snappydata.io)
>
>
> On Mon, Oct 26, 2015 at 11:17 AM, Younes Naguib <
> Younes.Naguib@tritondigital.com> wrote:
>
>> Hi all,
>>
>>
>>
>> I use the thrift server, and I cache a table using “cache table mytab”.
>>
>> Is there any sql to broadcast it too?
>>
>>
>>
>> *Thanks*
>>
>> *Younes Naguib*
>>
>> Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC  H3G
>> 1R8
>>
>> Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.naguib
>> @tritondigital.com <younes.naguib@streamtheworld.com>
>>
>>
>>
>
>

Mime
View raw message