cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: scylladb
Date Sun, 12 Mar 2017 20:36:05 GMT
On Sun, Mar 12, 2017 at 3:45 PM, Dor Laor <> wrote:

> On Sun, Mar 12, 2017 at 12:11 PM, Edward Capriolo <>
> wrote:
>> The simple claim that "Scylla IS a drop in replacement for C*" shows
>> that they clearly don't know as much as they think they do.
>> Even if it did supposedly "support everything" it would not actually work
>> like that. For example, some things in Cassandra work "the way they work" .
>> They are not specifically defined in a unit test or a document that
>> describes how they are supposed to work. During a massive code port you can
>> not reason if the code still works the same way in all situations.
>> Example, without using SEDA and using something else it definitely wont
>> work the same way when the thread pools fill up and it starts blocking,
>> dropping, whatever. There is so much implicitly undefined behavior.
> According to your definition there is no such a thing as drop and
> replacement, doesn't it?
> One of our users asked us to add a protocol verb that identify Scylla as
> Scylla so they'll know which
> is which for the time they run 2 clusters.
> Look, if we'll claim we have all the features and when someone checks they
> see we don't have LWT then it makes us a bad service. Usually when we get
> someone (specific) interested, we map their C* usage and say what feature
> isn't yet there. So far it's just lack of those not-implemented yet
> features that hold users back. We do try to mimic the exact behaviour of C*.
> Clearly, I can't defend a 100% drop-in replacement. Once we implement
> someone's selected
> featureset, then we're a drop-in replacement for them and we're not a good
> match for others.
> We're not after quick wins, quite the opposite.
>> Also just for argument sake. YCSB proves nothing. Nothing. It generates
>> key-value data, and well frankly that is not the primary use case of
>> Cassandra..... So again. Know what you don't know.
> a. We do not pretend we know it all.
>     We do have a 3 year mileage with Cassandra and 2.5 with Scylla and we
>     gained some knowledge... before we decided to go after the C* path, we
> considered
>     to reimplement Mongo, HDFS, Kafka and few more examples and the fact
> we chose
>     C* shows our appreciation to this project and not the opposite.
> b. YCSB is an industry standard, and that's why everybody use it.
>     We don't like it at all since it doesn't have prepared statements
> (it's time that
>     someone will merge this support).
>     It's not a plain K/V since it's a table of 10 columns of 100b each.
>     We do support wide rows and learned (the hard way) their challenge,
> especially
>     with compaction, repair and streaming. The current Scylla code doesn't
> cache
>     wide row beyond 10MB which isn't ideal. In 1.8 (next month) we have a
> partial
>     row caching which supposed to be very good. During the past 20 months
> since
>     our beta we tried to focus on good out-of-the-box experience to all
> real workloads
>     and we knowingly deferred features like LWT since we wanted a good
> solid base
>     before we reach feature parity. If we'll do a good job with a
> benchmark but a bad
>     one in real workload, we just shot ourselves in the foot. This was the
> case around our
>     beta but it was just a beta. Today we think we're in a very solid
> position. We still
>     have lots to complete around repair (which is ok but not great). There
> is a work
>     in progress to switch out from Merkle tree to a new algorithm and
> reduced latency
>     (almost there). We have mixed feelings about anti-compaction for
> incremental repair
>     but we're likely to go through this route too
>> On Sun, Mar 12, 2017 at 2:15 PM, Jonathan Haddad <>
>> wrote:
>>> I don't think Jeff comes across as angry.  He's simply pointing out that
>>> ScyllaDB isn't a drop in replacement for Cassandra.  Saying that it is is
>>> very misleading.  The marketing material should really say something like
>>> "drop in replacement for some workloads" or "aims to be a drop in
>>> replacement".  As is, it doesn't support everything, so it's not a drop in.
>>> On Sat, Mar 11, 2017 at 10:34 PM Dor Laor <> wrote:
>>>> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa <> wrote:
>>>> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
>>>> > Cassanda vs Scylla is a valid comparison because they both are
>>>> compatible. Scylla is a drop-in replacement for Cassandra.
>>>> No, they aren't, and no, it isn't
>>>> Jeff is angry with us for some reason. I don't know why, it's natural
>>>> that when
>>>> a new opponent there are objections and the proof lies on us.
>>>> We go through great deal of doing it and we don't just throw comments
>>>> without backing.
>>>> Scylla IS a drop in replacement for C*. We support the same CQL (from
>>>> version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based
>>>> 2.1.8). In 1.7 release we support cql uploader
>>>> from 3.x. We will support the SStable format of 3.x natively in 3 month
>>>> time. Soon all of the feature set will be implemented. We always have been
>>>> using this page (not 100% up to date, we'll update it this week):
>>>> We add a jmx-proxy daemon in java in order to make the transition as
>>>> smooth as possible. Almost all the nodetool commands just work, for sure
>>>> all the important ones.
>>>> Btw: we have a RESTapi and Prometheus formats, much better than the
>>>> hairy jmx one.
>>>> Spark, Kairosdb, Presto and probably Titan (we add Thrift just for
>>>> legacy users and we don't intend
>>>> to decommission an api).
>>>> Regarding benchmarks, if someone finds a flaw in them, we'll do the
>>>> best to fix it.
>>>> Let's ignore them and just here what our users have to say:
According to your definition there is no such a thing as drop and
replacement, doesn't it?

I think if there was a standard like SQL2007 and you could say, "given the
same data these same queries produce the same results". To agree with you I
do believe there exists different layers of compatibility or requirements a
given user actually needs vs what they think they need. For example, do
users care what happens when READ stage fills up, blocked socket,
exception, whatever, maybe the oversubscribed enterprise never hits this
condition so the debate is meaningless to that entity.

In any case, I would say that personally I am very interested in one
Scylladb is up to. I am not  dedicated to Cassandra being written in Java,
or to say that everything in Cassandra is ideal the "right" implementation
choice. For example, paxos/epaxos. It seems like that particular thing is
at sticking point and a competitive 'kick in the pants' might move things
forward for everyone.

Cassandra is not always the place where new ideas are rapidly embraced,
often half jokingly I say "C* puts the NO in nosql :)". I think there is a
right and a wrong way broach a discussion like this on the Cassandra
mailing list. It goes beyond YCSB. For example, if there is a great
optimization in Compare And Swap in scylladb. It would be reasonable to
frame the problem in the "real world" far beyond YCSB. Set the stage for
the use case, describe the schema and what the user is going for. Explain
how Apache Cassandra works and what bottlenecks are being hit, then "drop
in" :) scylladb and explain how that workload is executed more efficiently.

If people are still grumpy because there side lost at least it is a legit
learning lesson. All the grumpy pants types can decide to bury there face
in the mud or hand wave the use case. But most will respect the engineering
(at least begrudgingly).

Otherwise as being pointed out the c / c++ better then java "aerospikism"
punch wont land well here.

View raw message