cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Goutham reddy <goutham.chiru...@gmail.com>
Subject Re: Good way of configuring Apache spark with Apache Cassandra
Date Fri, 04 Jan 2019 21:34:56 GMT
Thank you very much Dor for the detailed information, yes that should be
the primary reason why we have to isolate from Cassandra.

Thanks and Regards,
Goutham Reddy


On Fri, Jan 4, 2019 at 1:29 PM Dor Laor <dor@scylladb.com> wrote:

> I strongly recommend option B, separate clusters. Reasons:
>  - Networking of node-node is negligible compared to networking within the
> node
>  - Different scaling considerations
>    Your workload may require 10 Spark nodes and 20 database nodes, so why
> bundle them?
>    This ratio may also change over time as your application evolves and
> amount of data changes.
>  - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't
> want it to affect Cassandra and the opposite.
>    If you isolate it with cgroups, you may have too much idle time when
> the above doesn't happen.
>
>
> On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy <goutham.chirutha@gmail.com>
> wrote:
>
>> Hi,
>> We have requirement of heavy data lifting and analytics requirement and
>> decided to go with Apache Spark. In the process we have come up with two
>> patterns
>> a. Apache Spark and Apache Cassandra co-located and shared on same nodes.
>> b. Apache Spark on one independent cluster and Apache Cassandra as one
>> independent cluster.
>>
>> Need good pattern how to use the analytic engine for Cassandra. Thanks in
>> advance.
>>
>> Regards
>> Goutham.
>>
>

Mime
View raw message