cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sanda <>
Subject Re: Forming a cluster of embedded Cassandra instances
Date Sun, 14 Feb 2016 22:26:52 GMT
The motivation was to make it easy for someone to get up and running
quickly with the project. Clone the git repo, run the maven build, and then
you are all set. It definitely does lower the learning curve for someone
just getting started with a project and who is not really thinking about
Cassandra. It also is convenient for non-devs who need to quickly get the
project up and running. For development, we have people working on Linux,
Mac OS X, and Windows. I am not a Windows user and not even sure if ccm
works on Windows, so ccm can't be the de factor standard for development.

On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky <>

> What motivated the use of an embedded instance for development - as
> opposed to simply spawning a process for Cassandra?
> -- Jack Krupansky
> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda <> wrote:
>> The project I work on day to day uses an embedded instance of Cassandra,
>> but it is intended for primarily for development. We embed Cassandra in a
>> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I
>> personally do not do this. I use and recommend ccm
>> <> for development. If you do you WildFly,
>> there is also wildfly-cassandra
>> <> which deploys Cassandra
>> as a custom WildFly extension. In other words it is deployed in WildFly
>> like other subsystems like EJB, web, etc, not like an application. There
>> isn't a whole lot of active development on this, but it could be another
>> option.
>> For production, we have to support single node clusters (not embedded
>> though), and it has been challenging for pretty much all the reasons you
>> find people saying not to do so.
>> As for failure detection and cluster membership changes, are you using
>> the Datastax driver? You can register an event listener with the driver to
>> receive notifications for those things.
>> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad <>
>> wrote:
>>> +1 to what jack said. Don't mess with embedded till you understand the
>>> basics of the db. You're not making your system any less complex, I'd say
>>> you're most likely going to shoot yourself in the foot.
>>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky <>
>>> wrote:
>>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain
>>>> can be avoided. Two nodes would not support HA. You need to be able to
>>>> reach a quorum, which is defined as n/2+1 where n is the number of
>>>> replicas. IOW, you cannot update the data if a quorum cannot be reached.
>>>> The data on any given node needs to be replicated on at least two other
>>>> nodes.
>>>> Embedded Cassandra is only for extremely sophisticated developers - not
>>>> those who are new to Cassandra, with a "superficial understanding".
>>>> As a general proposition, you should not be running application code on
>>>> Cassandra nodes.
>>>> That said, if any of the senior Cassandra developers wish to personally
>>>> support your efforts towards embedded clusters, they are certainly free to
>>>> do so. we'll see if any of them step forward.
>>>> -- Jack Krupansky
>>>> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas <
>>>>> wrote:
>>>>> Hi all,
>>>>> TL;DR: I have a very superficial understanding of Cassandra and am
>>>>> currently evaluating it for a project.
>>>>> * Can Cassandra be embedded into another JVM application?
>>>>> * Can such embedded instances form a cluster?
>>>>> * Can the application use the the failure detection and cluster
>>>>> membership dissemination infrastructure of embedded Cassandra?
>>>>> ----
>>>>> I am in the process of re-packaging a SaaS system written in Java to
>>>>> be deployed on-premise by customers. The SaaS system currently uses AWS
>>>>> DynamoDB. The data storage needs for this application are modest, but
>>>>> would like to keep the deployment complexity to a minimum. Here are three
>>>>> different usecases the on-premise system should support:
>>>>> 1. single-node deployments with minimal complexity
>>>>> 2. two-node HA deployments; the data and processing needs dictated by
>>>>> the load on the system are well under what a single node can do, but
>>>>> second node is there to satisfy the HA requirement as a hot standby
>>>>> 3. a multi-node clustered deployment, where higher operational
>>>>> complexity is justified
>>>>> I am considering Cassandra for these usecases.
>>>>> For usecase #1, I hope to embed Cassandra into the same JVM as my
>>>>> application. I read on the web that CassandraDaemon can be used this
>>>>> Is that accurate? What other applications embed Cassandra this way? I
>>>>> *think* JetBrains Upsource does, but do you know other ones? (Incidentally,
>>>>> my Java application embeds Jetty webserver also).
>>>>> For usecase #2, I am hoping that I can deploy two instances of this
>>>>> ensemble and have the embedded Cassandra instances form a cluster. If
>>>>> configure every write to be replicated on both nodes synchronously, then
>>>>> will satisfy the HA needs of this usecase. Is it feasible to form clusters
>>>>> of embedded Cassandra instances?
>>>>> For usecase #3, I can form a large cluster of the ensemble where all
>>>>> writes are replicated synchronously to a quorum of nodes.
>>>>> Finally, in usecase #2 and #3, I'd like to use the failure detection
>>>>> and cluster membership dissemination infrastructure of Cassandra from
>>>>> within my application. Is it possible to be notified of membership changes
>>>>> when embedding Cassandra? I could use a separate library to do this (say,
>>>>> with JGroups or Akka) but I fear that if this library and the embedded
>>>>> Cassandra instances disagrees, it could lead to subtle bugs.
>>>>> Thanks,
>>>>> Binil
>>>>> PS: Cross-posted at
>> --
>> - John


- John

View raw message