cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Goutham reddy <goutham.chiru...@gmail.com>
Subject Re: Good way of configuring Apache spark with Apache Cassandra
Date Sat, 05 Jan 2019 04:03:08 GMT
Thanks Jonathan, I believe we have to reconsider the way analytics have to
be performed.

On Fri, Jan 4, 2019 at 1:46 PM Jonathan Haddad <jon@jonhaddad.com> wrote:

> If you absolutely have to use Cassandra as the source of your data, I
> agree with Dor.
>
> That being said, if you're going to be doing a lot of analytics, I
> recommend using something other than Cassandra with Spark.  The performance
> isn't particularly wonderful and you'll likely get anywhere from 10-50x
> improvement from putting the data in an analytics friendly format (parquet)
> and on a block / blob store (DFS or S3) instead.
>
> On Fri, Jan 4, 2019 at 1:43 PM Goutham reddy <goutham.chirutha@gmail.com>
> wrote:
>
>> Thank you very much Dor for the detailed information, yes that should be
>> the primary reason why we have to isolate from Cassandra.
>>
>> Thanks and Regards,
>> Goutham Reddy
>>
>>
>> On Fri, Jan 4, 2019 at 1:29 PM Dor Laor <dor@scylladb.com> wrote:
>>
>>> I strongly recommend option B, separate clusters. Reasons:
>>>  - Networking of node-node is negligible compared to networking within
>>> the node
>>>  - Different scaling considerations
>>>    Your workload may require 10 Spark nodes and 20 database nodes, so
>>> why bundle them?
>>>    This ratio may also change over time as your application evolves and
>>> amount of data changes.
>>>  - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't
>>> want it to affect Cassandra and the opposite.
>>>    If you isolate it with cgroups, you may have too much idle time when
>>> the above doesn't happen.
>>>
>>>
>>> On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy <
>>> goutham.chirutha@gmail.com> wrote:
>>>
>>>> Hi,
>>>> We have requirement of heavy data lifting and analytics requirement and
>>>> decided to go with Apache Spark. In the process we have come up with two
>>>> patterns
>>>> a. Apache Spark and Apache Cassandra co-located and shared on same
>>>> nodes.
>>>> b. Apache Spark on one independent cluster and Apache Cassandra as one
>>>> independent cluster.
>>>>
>>>> Need good pattern how to use the analytic engine for Cassandra. Thanks
>>>> in advance.
>>>>
>>>> Regards
>>>> Goutham.
>>>>
>>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
-- 
Regards
Goutham Reddy

Mime
View raw message