cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <>
Subject Re: Forming a cluster of embedded Cassandra instances
Date Sun, 14 Feb 2016 19:52:44 GMT
What motivated the use of an embedded instance for development - as opposed
to simply spawning a process for Cassandra?

-- Jack Krupansky

On Sun, Feb 14, 2016 at 2:05 PM, John Sanda <> wrote:

> The project I work on day to day uses an embedded instance of Cassandra,
> but it is intended for primarily for development. We embed Cassandra in a
> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I
> personally do not do this. I use and recommend ccm
> <> for development. If you do you WildFly,
> there is also wildfly-cassandra
> <> which deploys Cassandra
> as a custom WildFly extension. In other words it is deployed in WildFly
> like other subsystems like EJB, web, etc, not like an application. There
> isn't a whole lot of active development on this, but it could be another
> option.
> For production, we have to support single node clusters (not embedded
> though), and it has been challenging for pretty much all the reasons you
> find people saying not to do so.
> As for failure detection and cluster membership changes, are you using the
> Datastax driver? You can register an event listener with the driver to
> receive notifications for those things.
> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad <>
> wrote:
>> +1 to what jack said. Don't mess with embedded till you understand the
>> basics of the db. You're not making your system any less complex, I'd say
>> you're most likely going to shoot yourself in the foot.
>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky <>
>> wrote:
>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain
>>> can be avoided. Two nodes would not support HA. You need to be able to
>>> reach a quorum, which is defined as n/2+1 where n is the number of
>>> replicas. IOW, you cannot update the data if a quorum cannot be reached.
>>> The data on any given node needs to be replicated on at least two other
>>> nodes.
>>> Embedded Cassandra is only for extremely sophisticated developers - not
>>> those who are new to Cassandra, with a "superficial understanding".
>>> As a general proposition, you should not be running application code on
>>> Cassandra nodes.
>>> That said, if any of the senior Cassandra developers wish to personally
>>> support your efforts towards embedded clusters, they are certainly free to
>>> do so. we'll see if any of them step forward.
>>> -- Jack Krupansky
>>> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas <
>>>> wrote:
>>>> Hi all,
>>>> TL;DR: I have a very superficial understanding of Cassandra and am
>>>> currently evaluating it for a project.
>>>> * Can Cassandra be embedded into another JVM application?
>>>> * Can such embedded instances form a cluster?
>>>> * Can the application use the the failure detection and cluster
>>>> membership dissemination infrastructure of embedded Cassandra?
>>>> ----
>>>> I am in the process of re-packaging a SaaS system written in Java to be
>>>> deployed on-premise by customers. The SaaS system currently uses AWS
>>>> DynamoDB. The data storage needs for this application are modest, but I
>>>> would like to keep the deployment complexity to a minimum. Here are three
>>>> different usecases the on-premise system should support:
>>>> 1. single-node deployments with minimal complexity
>>>> 2. two-node HA deployments; the data and processing needs dictated by
>>>> the load on the system are well under what a single node can do, but the
>>>> second node is there to satisfy the HA requirement as a hot standby
>>>> 3. a multi-node clustered deployment, where higher operational
>>>> complexity is justified
>>>> I am considering Cassandra for these usecases.
>>>> For usecase #1, I hope to embed Cassandra into the same JVM as my
>>>> application. I read on the web that CassandraDaemon can be used this way.
>>>> Is that accurate? What other applications embed Cassandra this way? I
>>>> *think* JetBrains Upsource does, but do you know other ones? (Incidentally,
>>>> my Java application embeds Jetty webserver also).
>>>> For usecase #2, I am hoping that I can deploy two instances of this
>>>> ensemble and have the embedded Cassandra instances form a cluster. If I
>>>> configure every write to be replicated on both nodes synchronously, then
>>>> will satisfy the HA needs of this usecase. Is it feasible to form clusters
>>>> of embedded Cassandra instances?
>>>> For usecase #3, I can form a large cluster of the ensemble where all
>>>> writes are replicated synchronously to a quorum of nodes.
>>>> Finally, in usecase #2 and #3, I'd like to use the failure detection
>>>> and cluster membership dissemination infrastructure of Cassandra from
>>>> within my application. Is it possible to be notified of membership changes
>>>> when embedding Cassandra? I could use a separate library to do this (say,
>>>> with JGroups or Akka) but I fear that if this library and the embedded
>>>> Cassandra instances disagrees, it could lead to subtle bugs.
>>>> Thanks,
>>>> Binil
>>>> PS: Cross-posted at
> --
> - John

View raw message