Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1509B10A48 for ; Tue, 3 Mar 2015 09:45:09 +0000 (UTC) Received: (qmail 32737 invoked by uid 500); 3 Mar 2015 09:45:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32696 invoked by uid 500); 3 Mar 2015 09:45:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32685 invoked by uid 99); 3 Mar 2015 09:45:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2015 09:45:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pavel.velikhov@gmail.com designates 209.85.215.45 as permitted sender) Received: from [209.85.215.45] (HELO mail-la0-f45.google.com) (209.85.215.45) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2015 09:44:39 +0000 Received: by labhs14 with SMTP id hs14so36118964lab.1 for ; Tue, 03 Mar 2015 01:42:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:mime-version:subject:in-reply-to:date:cc :message-id:references:to; bh=oY6MgqHyHLituweY3rgWyGnAWTO+axVROfk2kFBj1Bc=; b=Pdr4vpYbUUCR48aGiJy/xaedebTuce60gNpjQDZop6Vbax5oD2nMHzpod2ObmanXkD GbTap0reFFg491huVm+E5X2HTNOLyNmL6d9XT33iakJZKYmE6/jeGIVftaBQVFYl3NSe Deask6OfVQTbJXPpF/y7C2NDQW7n0UYdbyPh2boHF5o0xXt5ZO16/CuYJYPv+24XUcRz vaWhMm0z6aC5x2vXUSTdpvOU2JgTTbegkhPtkkBTIhAEZ50k7oFgUzmj9DEIOmOlbBov H7NZzpdNUPnUxK3s5nwoUjC1i5AluOqGJKwjbolea+gswfxw0A7lmzU2ujKR3E667m7r ur6g== X-Received: by 10.112.125.39 with SMTP id mn7mr27929430lbb.23.1425375742920; Tue, 03 Mar 2015 01:42:22 -0800 (PST) Received: from [192.168.1.139] ([91.246.103.68]) by mx.google.com with ESMTPSA id dk5sm56447lad.41.2015.03.03.01.42.20 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 03 Mar 2015 01:42:20 -0800 (PST) From: Pavel Velikhov X-Google-Original-From: Pavel Velikhov Content-Type: multipart/alternative; boundary="Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639" Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Subject: Re: RDD partitions per executor in Cassandra Spark Connector In-Reply-To: Date: Tue, 3 Mar 2015 12:42:24 +0300 Cc: user@spark.apache.org, user@cassandra.apache.org Message-Id: References: To: mail@frensjan.nl X-Mailer: Apple Mail (2.1993) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi, is there a paper or a document where one can read how Spark reads = Cassandra data in parallel? And how it writes data back from RDDs? Its a = bit hard to have a clear picture in mind. Thank you, Pavel Velikhov > On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan wrote: >=20 > Hi all, >=20 > I didn't find the issues button on = https://github.com/datastax/spark-cassandra-connector/ = so posting = here. >=20 > Any one have an idea why token ranges are grouped into one partition = per executor? I expected at least one per core. Any suggestions on how = to work around this? Doing a repartition is way to expensive as I just = want more partitions for parallelism, not reshuffle ... >=20 > Thanks in advance! > Frens Jan --Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Hi, is there a paper or a document where one can read how = Spark reads Cassandra data in parallel? And how it writes data back from = RDDs? Its a bit hard to have a clear picture in mind.

Thank you,
Pavel = Velikhov

On Mar 3, 2015, at 1:08 AM, = Rumph, Frens Jan <mail@frensjan.nl> wrote:

Hi all,

I = didn't find the issues button on https://github.com/datastax/spark-cassandra-connector/ so = posting here.

Any one have an idea why token ranges are grouped into one = partition per executor? I expected at least one per core. Any = suggestions on how to work around this? Doing a repartition is way to = expensive as I just want more partitions for parallelism, not reshuffle = ...

Thanks in = advance!
Frens Jan

= --Apple-Mail=_209AAD50-D4DF-4DD3-938B-8563130DF639--