incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Dusbabek <>
Subject Re: Creating two instances in code
Date Fri, 13 Aug 2010 19:42:47 GMT
On Fri, Aug 13, 2010 at 11:28, Bjorn Borud <> wrote:
> Ryan Daum <> writes:
>> This is very discouraging; I've looked several times at this code and could
>> not believe my eyes in regard to the wanton use of global statics. In
>> addition to smelling bad, it makes it difficult to embed Cassandra. Is there
>> no will at all to fix this?
> I experienced all manner of problems when trying to embed Cassandra
> myself. the primary reason I wanted to embed Cassandra was for unit
> testing.
> of course, reality came crashing in when I had more than one test and
> thus more than one embedded Cassandra instance.  I tried to look for
> quick solutions to this, but eventually flushed an entire week's work
> down the toilet and left for vacation.
> okay, so what I would have wanted to do if I had the time:
>  - go through the Cassandra code and remove singletons.
>  - make Cassandra easier to embed by making starting and stopping work
>    properly (for some reason that I have forgotten I had shutdown
>    and/or timing issues.  for servers to be embeddable the start() and
>    stop()/shutdown() methods need to block until some known state is
>    reached.  (if shutdown() has to be slow because of work that needs
>    to be done before safe shutdown it may be an idea to implement
>    kill() for unsafe shutdown -- for instance when you know you will
>    nuke the data anyway)
>  - Remove dependence on config files.  It should be possible to
>    just instantiate an embedded Cassandra server, pass it a config
>    object and then start it without having to touch the filesystem or
>    access any resource files for config. Depending on files or
>    resources for config is bad. (However, there is nothing wrong with
>    having a trivial API for reading files to produce a config object
>    you can then pass into Cassandra).
>    The detour I made into rendering an Apache Velocity template to
>    produce a storage-conf.xml only to have my embedded Cassandra
>    instance read it again was just silly.

I looked into doing this when I was first learning the code and had an
experience simliar to yours.  At the time there wasn't much interest
in seeing it through to fruition, but maybe times have changed.

If I were to attempt it again I would do it in this error:
1.  Make the config customizable.
2.  Make the services re-entrant (You should be able to start, stop,
then start again without problems).
3.  Get rid of the singletons.  This will involve coming up with a
smart way to couple instances of the services with each other.
4.  Integrate the storage port into how we canonically identify a node
(its just hostname now).
5.  While you're at it, figure out how to get JMX to bind to something
other than  (I hear it is possible, see

> there are other valid reasons for wanting to embed Cassandra besides
> unit testing.  for instance, if you are writing an application that
> depends on Cassandra and you want the option of packaging it as a single
> binary for single node experimentation, development and demo purposes.

I'd kind of like to see this too, although I admit that from the
pragmatic standpoint of running a Cassandra server, it represents a
whole lot of change for what amounts to very little tangible benefit.

>From a development standpoint, the biggest benefit I see it would that
we could write unit tests for small clusters that run on a single

One interesting thing that this would make possible is the ability to
have a node with >1 tokens in a single JVM.  Useful, who knows?  But
it is interesting because I think it would make Cassandra more elastic
(and could theoretically help with the hot-node problem when using


> so in short:  yes, I am very, very interested in Cassandra being
> embeddable,
> -Bjørn

View raw message