Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of pavel.velikhov@gmail.com
 designates 209.85.215.45 as permitted sender)
From: Pavel Velikhov <pavel.velikhov@gmail.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639"
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
Subject: Re: RDD partitions per executor in Cassandra Spark Connector
In-Reply-To: 
 <CAH3f1B8rNJ3Joy07QL7yPQOkD1td_X64ouYsp-QusxyhXHudZA@mail.gmail.com>
Date: Tue, 3 Mar 2015 12:42:24 +0300
Cc: user@spark.apache.org,
 user@cassandra.apache.org
Message-Id: <BF087561-39AE-4A5E-828E-B477FFEB5799@icloud.com>
References: 
 <CAH3f1B8rNJ3Joy07QL7yPQOkD1td_X64ouYsp-QusxyhXHudZA@mail.gmail.com>
To: mail@frensjan.nl


--Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Hi, is there a paper or a document where one can read how Spark reads =
Cassandra data in parallel? And how it writes data back from RDDs? Its a =
bit hard to have a clear picture in mind.

Thank you,
Pavel Velikhov

> On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <mail@frensjan.nl> wrote:
>=20
> Hi all,
>=20
> I didn't find the issues button on =
https://github.com/datastax/spark-cassandra-connector/ =
<https://github.com/datastax/spark-cassandra-connector/> so posting =
here.
>=20
> Any one have an idea why token ranges are grouped into one partition =
per executor? I expected at least one per core. Any suggestions on how =
to work around this? Doing a repartition is way to expensive as I just =
want more partitions for parallelism, not reshuffle ...
>=20
> Thanks in advance!
> Frens Jan


--Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">Hi, is there a paper or a document where one can read how =
Spark reads Cassandra data in parallel? And how it writes data back from =
RDDs? Its a bit hard to have a clear picture in mind.<div class=3D""><br =
class=3D""></div><div class=3D"">Thank you,</div><div class=3D"">Pavel =
Velikhov</div><div class=3D""><br class=3D""><div><blockquote =
type=3D"cite" class=3D""><div class=3D"">On Mar 3, 2015, at 1:08 AM, =
Rumph, Frens Jan &lt;<a href=3D"mailto:mail@frensjan.nl" =
class=3D"">mail@frensjan.nl</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D"">Hi all,<div class=3D""><br class=3D""></div><div class=3D"">I =
didn't find the <i class=3D"">issues</i> button on&nbsp;<a =
href=3D"https://github.com/datastax/spark-cassandra-connector/" =
class=3D"">https://github.com/datastax/spark-cassandra-connector/</a> so =
posting here.</div><div class=3D""><br class=3D""></div><div =
class=3D"">Any one have an idea why token ranges are grouped into one =
partition per executor? I expected at least one per core. Any =
suggestions on how to work around this? Doing a repartition is way to =
expensive as I just want more partitions for parallelism, not reshuffle =
...</div><div class=3D""><br class=3D""></div><div class=3D"">Thanks in =
advance!</div><div class=3D"">Frens Jan</div></div>
</div></blockquote></div><br class=3D""></div></body></html>=

--Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639--