incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Uckun <timuc...@gmail.com>
Subject Re: Cassandra vs Elasticsearch.
Date Sun, 04 May 2014 11:03:17 GMT
I have been doing some research on how ES persists to disk and it seems
like it seems to be pretty robust.  Basically what happens when you write a
document is that the document gets written to a log file on the local disk
and and also to multiple shards. This happens synchronously and is tunable.
You can also set up a  gateway which can keep track of system state and can
be used to recover indices on boxes that fell over.  There is an S3 gateway
for those on Amazon infrastructure. From what I understand writing to the
gateway is async.   Needless to say using a gateway will increase network
traffic significantly and will be slower to recover than the local disk.

The default settings on ES are to use the disk backing storage but you can
choose to run it as an in memory store if you want.

A somewhat terse explanation is here
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html


On Sun, May 4, 2014 at 10:43 PM, Jack Krupansky <jack@basetechnology.com>wrote:

>   That’s a key advantage of DataStax Enterprise – Solr is fully
> integrated into the Casssandra cluster, so there is only a single
> infrastructure.
>
> -- Jack Krupansky
>
>  *From:* Tim Uckun <timuckun@gmail.com>
> *Sent:* Sunday, May 4, 2014 6:36 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra vs Elasticsearch.
>
>  I am hesitant about keeping both a cassandra and ES cluster because it
> effectively doubles my infrastructure costs.  It may be much cheaper to
> keep the data in log files and have ES index them for searching.    Thanks
> for the input everybody, there is much to think about here.
>
>
>
>
> On Sat, May 3, 2014 at 7:31 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
>>  Hello Tim
>>
>> You're absolutely right about ES for the query part. This is the perfect
>> fit for complex queries. Now regarding your question:
>>
>> "What advantages does Cassandra give me over ES?" --> linear scalability
>> & durability. ES is just a super index cluster. I've talked to ES guys. If
>> they do not sell ES right now as a "database for complex search" it's
>> because there is no strong guarantee about durability for your data. Many
>> people just live with it and it's fine. Also, if you store the original
>> data and just pump it into ES it's also fine.
>>
>>
>>
>>
>> On Sat, May 3, 2014 at 9:14 AM, Tim Uckun <timuckun@gmail.com> wrote:
>>
>>>    Hey all.
>>>
>>> I have been trying out some data stores for time series data and
>>> Cassandra was the first on my list because so many people are using it for
>>> the same purpose.  I have read many articles on how to model my time series
>>> data and tried several variations of schemas which I thought made sense for
>>> my data but I have really struggled to run some complex queries I need to
>>> run.  This has led me down a kind of a rabbit hole of trying to create
>>> various "materialized views" and shotgunning the data into multiple tables
>>> which might be able to run my queries.
>>>
>>> In the mean time I also took the same data and pumped it into
>>> Elasticsearch and was able to run almost all the queries I needed without
>>> doing anything fancy. Just put the data in, and run your query. The new
>>> aggregations in ES are pretty slick although they don't seem to be 100%
>>> accurate compared to running the same query in Postgres.
>>>
>>> My question is this.  What advantages does Cassandra give me over ES?
>>> Does it compact the data better? Is it faster to query once your data sizes
>>> are huge? Does it use less bandwidth? Is it easier to administer?
>>>
>>> I know there must be very compelling reasons to use C* because so many
>>> companies are depending on it for their bread and butter so I'd love to
>>> hear your take.
>>>
>>> Thanks.
>>>
>>
>>
>
>

Mime
View raw message