cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romain Hardouin <>
Subject Re: Heavy one-off writes best practices
Date Tue, 06 Feb 2018 19:06:15 GMT
 We use Spark2Cassandra (this fork works with C*3.0 )
SSTables are streamed to Cassandra by Spark2Cassandra (so you need to open port 7000 accordingly).During
benchmark we used 25 EMR nodes but in production we use less nodes to be more gentle with

    Le mardi 6 février 2018 à 16:05:16 UTC+1, Julien Moumne <> a
écrit :  
 This does look like a very viable solution. Thanks.
Could you give us some pointers/documentation on : - how can we build such SSTables using
spark jobs, maybe ? - how do we send these tables
to cassandra? does a simple SCP work? - what is the recommended size for sstables for when
it does not fit a single executor
On 5 February 2018 at 18:40, Romain Hardouin <> wrote:

  Hi Julien,
We have such a use case on some clusters. If you want to insert big batches at fast pace the
only viable solution is to generate SSTables on Spark side and stream them to C*. Last time
we benchmarked such a job we achieved 1.3 million partitions inserted per seconde on a 3 C*
nodes test cluster - which is impossible with regular inserts.
    Le lundi 5 février 2018 à 03:54:09 UTC+1, kurt greaves <>
a écrit :  
Would you know if there is evidence that inserting skinny rows in sorted order (no batching)
helps C*?
This won't have any effect as each insert will be handled separately by the coordinator (or
a different coordinator, even). Sorting is also very unlikely to help even if you did batch.

 Also, in the case of wide rows, is there evidence that sorting clustering keys within partition
batches helps ease C*'s job?
No evidence, seems very unlikely. ​  

Software Engineering - Data Science
Mail: jmoumne@deezer.com12 rue d'Athènes 75009 Paris - France  
View raw message