cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Ott <alex...@gmail.com>
Subject Re: Performance impact with ALLOW FILTERING clause.
Date Sat, 17 Aug 2019 12:31:51 GMT
Spark connector doesn't do the "select * from table;" - it does reads by
token ranges, reading the data
(see https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/partitioner/CassandraPartition.scala#L14)



Jacques-Henri Berthemet  at "Thu, 25 Jul 2019 14:18:57 +0000" wrote:
 JB> Hi Asad,

 JB> That’s because of the way Spark works. Essentially, when you execute a Spark job,
it pulls the full content of the datastore (Cassandra
 JB> in your case) in it RDDs and works with it “in memory”. While Spark uses “data
locality” to read data from the nodes that have the
 JB> required data on its local disks, it’s still reading all data from Cassandra tables.
To do so it’s sending ‘select * from Table ALLOW
 JB> FILTERING’ query to Cassandra.

 JB> From Spark you don’t have much control on the initial query to fill the RDDs, sometimes
you’ll read the whole table even if you only
 JB> need one row.

 JB> Regards,

 JB> Jacques-Henri Berthemet

 JB> From: "ZAIDI, ASAD A" <az192g@att.com>
 JB> Reply to: "user@cassandra.apache.org" <user@cassandra.apache.org>
 JB> Date: Thursday 25 July 2019 at 15:49
 JB> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
 JB> Subject: Performance impact with ALLOW FILTERING clause.

 JB> Hello Folks,

 JB> I was going thru documentation and saw at many places saying ALLOW FILTERING causes
performance unpredictability.  Our developers says
 JB> ALLOW FILTERING clause is implicitly added on bunch of queries by spark-Cassandra
 connector and they cannot control it; however at the
 JB> same time we see unpredictability in application performance – just as documentation
says.  

 JB> I’m trying to understand why would a connector add a clause in query when this can
cause negative impact on database/application
 JB> performance. Is that data model that is driving connector make its decision and add
allow filtering to query automatically or if there
 JB> are other reason this clause is added to the code. I’m not a developer though I
want to know why developer don’t have any control on
 JB> this to happen.

 JB> I’ll appreciate your guidance here.

 JB> Thanks

 JB> Asad



-- 
With best wishes,                    Alex Ott
Solutions Architect EMEA, DataStax
http://datastax.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message