cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cassa L <lcas...@gmail.com>
Subject Re: Why don't I see my spark jobs running in parallel in Cassandra/Spark DSE cluster?
Date Fri, 27 Oct 2017 06:50:20 GMT
No, I dont use Yarn.  This is standalone spark that comes with DataStax
Enterprise version of Cassandra.

On Thu, Oct 26, 2017 at 11:22 PM, Jörn Franke <jornfranke@gmail.com> wrote:

> Do you use yarn ? Then you need to configure the queues with the right
> scheduler and method.
>
> On 27. Oct 2017, at 08:05, Cassa L <lcassa8@gmail.com> wrote:
>
> Hi,
> I have a spark job that has use case as below:
> RRD1 and RDD2 read from Cassandra tables. These two RDDs then do some
> transformation and after that I do a count on transformed data.
>
> Code somewhat  looks like this:
>
> RDD1=JavaFunctions.cassandraTable(...)
> RDD2=JavaFunctions.cassandraTable(...)
> RDD3 = RDD1.flatMap(..)
> RDD4 = RDD2.flatMap()
>
> RDD3.count
> RDD4.count
>
> In Spark UI I see count() functions are getting called one after another.
> How do I make it parallel? I also looked at below discussion from Cloudera,
> but it does not show how to run driver functions in parallel. Do I just add
> Executor and run them in threads?
>
> https://community.cloudera.com/t5/Advanced-Analytics-
> Apache-Spark/Getting-Spark-stages-to-run-in-parallel-
> inside-an-application/td-p/38515
>
> <Screen Shot 2017-10-26 at 10.54.51 PM.png>Attaching UI snapshot here?
>
>
> Thanks.
> LCassa
>
>

Mime
View raw message