cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Kesten <j.kes...@enercast.de>
Subject Re: Forming a cluster of embedded Cassandra instances
Date Mon, 15 Feb 2016 05:56:37 GMT
Hi,

the embedded cassandra to speedup entering the project may will work for developers, we used
it for junit. But a simple clone and maven build - I guess it will end in a single node cassandra
cluster. Remember cassandra is a distributed database, one will need more than one node to
get performance and fault tolerance. Also I would not recommend adding and removing of cluster
nodes at high frequency with application start-stop-cycles.

To help in getting things up and running, provide a small readme for downloading and starting
cassandra. For mac and linux unpacking the tar.gz and running cassandra.sh is not too complicated.
Or use a hint to the DataStax Community Edition installers. Apart from installing Java that
is a five minute stop to a single node "TestCluster".

Configuring a distributed setup is a bit more or a lot more difficult and definitly needs
more understanding and planning. 

Just as a hint and offtopic: I saw people using cassandra as application glue for interprocess
communication where every app server started a node (for communication, sessions and as queue
and so on).  If that is eventually a use case - have a look at hazelcast. 

Jan

Von meinem iPhone gesendet

> Am 14.02.2016 um 23:26 schrieb John Sanda <john.sanda@gmail.com>:
> 
> The motivation was to make it easy for someone to get up and running quickly with the
project. Clone the git repo, run the maven build, and then you are all set. It definitely
does lower the learning curve for someone just getting started with a project and who is not
really thinking about Cassandra. It also is convenient for non-devs who need to quickly get
the project up and running. For development, we have people working on Linux, Mac OS X, and
Windows. I am not a Windows user and not even sure if ccm works on Windows, so ccm can't be
the de factor standard for development.
> 
>> On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:
>> What motivated the use of an embedded instance for development - as opposed to simply
spawning a process for Cassandra?
>> 
>> 
>> 
>> -- Jack Krupansky
>> 
>>> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda <john.sanda@gmail.com> wrote:
>>> The project I work on day to day uses an embedded instance of Cassandra, but
it is intended for primarily for development. We embed Cassandra in a WildFly (i.e., JBoss)
server. It is packaged and deployed as an EAR. I personally do not do this. I use and recommend
ccm for development. If you do you WildFly, there is also wildfly-cassandra which deploys
Cassandra as a custom WildFly extension. In other words it is deployed in WildFly like other
subsystems like EJB, web, etc, not like an application. There isn't a whole lot of active
development on this, but it could be another option.
>>> 
>>> For production, we have to support single node clusters (not embedded though),
and it has been challenging for pretty much all the reasons you find people saying not to
do so.
>>> 
>>> As for failure detection and cluster membership changes, are you using the Datastax
driver? You can register an event listener with the driver to receive notifications for those
things.
>>> 
>>>> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad <jon@jonhaddad.com>
wrote:
>>>> +1 to what jack said. Don't mess with embedded till you understand the basics
of the db. You're not making your system any less complex, I'd say you're most likely going
to shoot yourself in the foot. 
>>>>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky <jack.krupansky@gmail.com>
wrote:
>>>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain
can be avoided. Two nodes would not support HA. You need to be able to reach a quorum, which
is defined as n/2+1 where n is the number of replicas. IOW, you cannot update the data if
a quorum cannot be reached. The data on any given node needs to be replicated on at least
two other nodes.
>>>>> 
>>>>> Embedded Cassandra is only for extremely sophisticated developers - not
those who are new to Cassandra, with a "superficial understanding".
>>>>> 
>>>>> As a general proposition, you should not be running application code
on Cassandra nodes.
>>>>> 
>>>>> That said, if any of the senior Cassandra developers wish to personally
support your efforts towards embedded clusters, they are certainly free to do so. we'll see
if any of them step forward.
>>>>> 
>>>>> 
>>>>> -- Jack Krupansky
>>>>> 
>>>>>> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas <binil.thomas.public@gmail.com>
wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> TL;DR: I have a very superficial understanding of Cassandra and am
currently evaluating it for a project. 
>>>>>> 
>>>>>> * Can Cassandra be embedded into another JVM application? 
>>>>>> * Can such embedded instances form a cluster? 
>>>>>> * Can the application use the the failure detection and cluster membership
dissemination infrastructure of embedded Cassandra?
>>>>>> 
>>>>>> ----  
>>>>>> 
>>>>>> I am in the process of re-packaging a SaaS system written in Java
to be deployed on-premise by customers. The SaaS system currently uses AWS DynamoDB. The data
storage needs for this application are modest, but I would like to keep the deployment complexity
to a minimum. Here are three different usecases the on-premise system should support:
>>>>>> 
>>>>>> 1. single-node deployments with minimal complexity
>>>>>> 2. two-node HA deployments; the data and processing needs dictated
by the load on the system are well under what a single node can do, but the second node is
there to satisfy the HA requirement as a hot standby
>>>>>> 3. a multi-node clustered deployment, where higher operational complexity
is justified
>>>>>> 
>>>>>> I am considering Cassandra for these usecases. 
>>>>>> 
>>>>>> For usecase #1, I hope to embed Cassandra into the same JVM as my
application. I read on the web that CassandraDaemon can be used this way. Is that accurate?
What other applications embed Cassandra this way? I *think* JetBrains Upsource does, but do
you know other ones? (Incidentally, my Java application embeds Jetty webserver also). 
>>>>>> 
>>>>>> For usecase #2, I am hoping that I can deploy two instances of this
ensemble and have the embedded Cassandra instances form a cluster. If I configure every write
to be replicated on both nodes synchronously, then it will satisfy the HA needs of this usecase.
Is it feasible to form clusters of embedded Cassandra instances?
>>>>>> 
>>>>>> For usecase #3, I can form a large cluster of the ensemble where
all writes are replicated synchronously to a quorum of nodes. 
>>>>>> 
>>>>>> Finally, in usecase #2 and #3, I'd like to use the failure detection
and cluster membership dissemination infrastructure of Cassandra from within my application.
Is it possible to be notified of membership changes when embedding Cassandra? I could use
a separate library to do this (say, with JGroups or Akka) but I fear that if this library
and the embedded Cassandra instances disagrees, it could lead to subtle bugs.
>>>>>> 
>>>>>> Thanks,
>>>>>> Binil
>>>>>> 
>>>>>> PS: Cross-posted at http://stackoverflow.com/questions/35384983/forming-a-cluster-of-embedded-cassandra-instances
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> - John
> 
> 
> 
> -- 
> 
> - John

Mime
View raw message