cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stone Fang <cnstonef...@gmail.com>
Subject Re: DataStax Spark driver performance for analytics workload
Date Tue, 10 Oct 2017 13:11:13 GMT
@kurt greaves

doubt that need to read all the data.it is common that there are so many
records in cassandra cluster.
if loading all the data,how to analyse?

On Mon, Oct 9, 2017 at 9:49 AM, kurt greaves <kurt@instaclustr.com> wrote:

> spark-cassandra-connector will provide the best way to achieve what you
> want, however under the hood it's still going to result in reading all the
> data, and because of the way Cassandra works it will essentially read the
> same SSTables multiple times from random points. You might be able to tune
> to make this not super bad, but pretty much reading all the data is going
> to have horrible implications for the cache if all your data doesn't fit in
> memory regardless of what you do.‚Äč
>

Mime
View raw message