cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose
Date Mon, 07 Mar 2016 18:17:28 GMT
If you're doing 100 searches a second each machine will be serving at most
100 requests per second, not 2000.

On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal <bhu1rawal@gmail.com> wrote:

> Well thats certainly true, there are these points worth discussing here :
>
> 1. Scatter Gather queries - Especially if the cluster size is large. Say
> we have a 20 node cluster, and we are searching 100 times a second. then
> effectively coordinator would be hitting each node 2000 times (20*100) That
> factor will only increase as the number of node goes higher. Im sure having
> a centralized index alleviates that problem.
> 2. High Cardinality (For columns like email / phone number)
> 3. Low Cardinality (Boolean column or any column with limited set of
> available options).
>
> SASI seems to be a good solution for Like queries this doc
> <https://github.com/apache/cassandra/blob/trunk/doc/SASI.md> looks really
> promising. But wouldn't it be better to tackle the use cases of search
> differently than from data storage ones, from a design standpoint?
>
> On Sun, Mar 6, 2016 at 9:14 PM, Jack Krupansky <jack.krupansky@gmail.com>
> wrote:
>
>> I don't have any direct personal experience with Stratio. It will all
>> depend on your queries and your data cardinality - some queries are fine
>> with secondary indexes while other are quite poor. Ditto for Lucene and
>> Solr.
>>
>> It is also worth noting that the new SASI feature of Cassandra supports
>> keyword and prefix/suffix search. But it doesn't support multi-column ad
>> hoc queries, which is what people tend to use Lucene and Solr for. So,
>> again, it all depends on your queries and your data cardinality.
>>
>> -- Jack Krupansky
>>
>> On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:
>>
>>> Yes Jack, we are rolling out with Stratio right now, we will assess the
>>> performance benefit it yields and can go for ElasticSearch/Solr later.
>>>
>>> As per your experience how does Stratio perform vis-a-vis Secondary
>>> Indexes?
>>>
>>> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky <
>>> jack.krupansky@gmail.com> wrote:
>>>
>>>> You haven't been clear about how you intend to add Solr. You can also
>>>> use Stratio or Stargate for basic Lucene search if you don't want need full
>>>> Solr support and want to stick to open source rather than go with DSE
>>>> Search for Solr.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal <bhu1rawal@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Sean and Nirmallaya.
>>>>>
>>>>> @Jack, We are going with DSC right now and plan to use spark and later
>>>>> solr over the analytics DC. The use case is to have  olap and oltp
>>>>> workloads separated and not intertwine them, whether it is achieved by
>>>>> creating a new DC or a new cluster altogether. From Nirmallaya's and
Sean's
>>>>> answer I could understand that its easily achievable by creating a separate
>>>>> DC, app client will need to be made DC aware and it should not make a
>>>>> coordinator in dc3. And same goes for spark configuration, it should
read
>>>>> from 3rd DC. Correct me if I'm wrong.
>>>>>
>>>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" <jack.krupansky@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > DataStax Enterprise (DSE) should be fine for three or even four
data
>>>>> centers in the same cluster. Or are you talking about some custom Solr
>>>>> implementation?
>>>>> >
>>>>> > -- Jack Krupansky
>>>>> >
>>>>> > On Fri, Mar 4, 2016 at 9:21 AM, <SEAN_R_DURITY@homedepot.com>
wrote:
>>>>> >>
>>>>> >> Sure. Just add a new DC. Alter your keyspaces with a new
>>>>> replication factor for that DC. Run repairs on the new DC to get the
data
>>>>> streamed. Then make sure your clients only connect to the DC(s) that
they
>>>>> need.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Separation of workloads is one of the key powers of a Cassandra
>>>>> cluster.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> You may want to look at different configurations for the analytics
>>>>> cluster – smaller replication factor, more memory per node, more disk
per
>>>>> node, perhaps less vnodes. Others may chime in with their experience.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Sean Durity
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> From: Bhuvan Rawal [mailto:bhu1rawal@gmail.com]
>>>>> >> Sent: Friday, March 04, 2016 3:27 AM
>>>>> >> To: user@cassandra.apache.org
>>>>> >> Subject: How to create an additional cluster in Cassandra
>>>>> exclusively for Analytics Purpose
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> We would like to create an additional C* data center for batch
>>>>> processing using spark on CFS. We would like to limit this DC exclusively
>>>>> for Spark operations and would like to continue the Application Servers
to
>>>>> continue fetching data from OLTP.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Is there any way to configure the same?
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> ​
>>>>> >>
>>>>> >> Regards,
>>>>> >>
>>>>> >> Bhuvan
>>>>> >>
>>>>> >>
>>>>> >> ________________________________
>>>>> >>
>>>>> >> The information in this Internet Email is confidential and may
be
>>>>> legally privileged. It is intended solely for the addressee. Access to
this
>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>> recipient, any disclosure, copying, distribution or any action taken
or
>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>> When addressed to our clients any opinions or advice contained in this
>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>> governing The Home Depot terms of business or client engagement letter.
The
>>>>> Home Depot disclaims all responsibility and liability for the accuracy
and
>>>>> content of this attachment and for any damages or losses arising from
any
>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>> items of a destructive nature, which may be contained in this attachment
>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>> damages in connection with this e-mail message or its attachment.
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message