cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Wang <dep...@gmail.com>
Subject Re: Query regarding CassandraJavaRDD while running spark job on cassandra
Date Thu, 24 Mar 2016 15:14:41 GMT
I suggest you post this to spark-cassandra-connector list.

On Sat, Mar 12, 2016 at 12:52 AM, Siddharth Verma <
verma.siddharth@snapdeal.com> wrote:

> In cassandra I have a table with the following schema.
>
> CREATE TABLE my_keyspace.my_table1 (
>     col_1 text,
>     col_2 text,
>     col_3 text,
>     col_4 text,,
>     col_5 text,
>     col_6 text,
>     col_7 text,
>     PRIMARY KEY (col_1, col_2, col_3)
> ) WITH CLUSTERING ORDER BY (col_2 ASC, col_3 ASC);
>
> For processing I create a spark job.
>
> CassandraJavaRDD<CassandraRow> data1 =
> function.cassandraTable("my_keyspace", "my_table1")
>
>
> 1. Does it guarantee mutual exclusivity of fetched rows across all RDDs
> which are on worker nodes?
> (At the cost of redundancy and verbosity, I will reiterate.
> Suppose I have an entry in the table : ('1','2','3','4','5','6','7')
> What I mean to ask is, when I perform transformations/actions on data1
> RDD), can I be sure that the above entry will be present on ONLY ONE worker
> node?)
>
> 2. All the data pertaining to one partition will be on one node?
> (Suppose I have the following entries in the table :
> ('p1','c2_1','c3_1','4','5','6','7')
> ('p1','c2_2','c3'_2,'4','5','6','7')
> ('p1','c2_3','c3_3','4','5','6','7')
> ('p1','c2_4','c3_4','4','5','6','7')
> ('p1' ........)
> ('p1' ........)
> ('p1' ........)
> All the data for the same partition will be present on only one node?
> )
>
> 3. If i have a DC specifically for analytics, and I place the spark worker
> on the same machines as cassandra node, for that entire DC.
> Can I make sure that the spark worker fetches the data from the token
> range present on that node? (I.E. the node does't fetch data present on
> different node)
> 3.1 (as with the above statement which doesn't have a 'where' clause).
> 3.2 (as with the above statement which has a 'where' clause).
>

Mime
View raw message