cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Velikhov <pavel.velik...@gmail.com>
Subject Re: RDD partitions per executor in Cassandra Spark Connector
Date Tue, 03 Mar 2015 09:42:24 GMT
Hi, is there a paper or a document where one can read how Spark reads Cassandra data in parallel?
And how it writes data back from RDDs? Its a bit hard to have a clear picture in mind.

Thank you,
Pavel Velikhov

> On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <mail@frensjan.nl> wrote:
> 
> Hi all,
> 
> I didn't find the issues button on https://github.com/datastax/spark-cassandra-connector/
<https://github.com/datastax/spark-cassandra-connector/> so posting here.
> 
> Any one have an idea why token ranges are grouped into one partition per executor? I
expected at least one per core. Any suggestions on how to work around this? Doing a repartition
is way to expensive as I just want more partitions for parallelism, not reshuffle ...
> 
> Thanks in advance!
> Frens Jan


Mime
View raw message