spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franc Carter <franc.car...@rozettatech.com>
Subject Re: Reading from a centralized stored
Date Tue, 06 Jan 2015 01:59:25 GMT
Thanks, that's what I suspected.

cheers

On Tue, Jan 6, 2015 at 12:56 PM, Cody Koeninger <cody@koeninger.org> wrote:

> If you are not co-locating spark executor processes on the same machines
> where the data is stored, and using an rdd that knows about which node to
> prefer scheduling a task on, yes, the data will be pulled over the network.
>
> Of the options you listed, S3 and DynamoDB cannot have spark running on
> the same machines. Cassandra can be run on the same nodes as spark, and
> recent versions of the spark cassandra connector implement preferred
> locations.  You can run an rdbms on the same nodes as spark, but JdbcRDD
> doesn't implement preferred locations.
>
> On Mon, Jan 5, 2015 at 6:25 PM, Franc Carter <franc.carter@rozettatech.com
> > wrote:
>
>>
>> Hi,
>>
>> I'm trying to understand how a Spark Cluster behaves when the data it is
>> processing resides on a centralized/remote store (S3, Cassandra, DynamoDB,
>> RDBMS etc).
>>
>> Does every node in the cluster retrieve all the data from the central
>> store ?
>>
>> thanks
>>
>> --
>>
>> *Franc Carter* | Systems Architect | Rozetta Technology
>>
>> franc.carter@rozettatech.com  <franc.carter@rozettatech.com>|
>> www.rozettatechnology.com
>>
>> Tel: +61 2 8355 2515
>>
>> Level 4, 55 Harrington St, The Rocks NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>> AUSTRALIA
>>
>>
>


-- 

*Franc Carter* | Systems Architect | Rozetta Technology

franc.carter@rozettatech.com  <franc.carter@rozettatech.com>|
www.rozettatechnology.com

Tel: +61 2 8355 2515

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

AUSTRALIA

Mime
View raw message