incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bjorn Borud <bbo...@gmail.com>
Subject Re: Creating two instances in code
Date Fri, 13 Aug 2010 16:28:10 GMT
Ryan Daum <ryan@thimbleware.com> writes:

> This is very discouraging; I've looked several times at this code and could
> not believe my eyes in regard to the wanton use of global statics. In
> addition to smelling bad, it makes it difficult to embed Cassandra. Is there
> no will at all to fix this?

I experienced all manner of problems when trying to embed Cassandra
myself. the primary reason I wanted to embed Cassandra was for unit
testing.

I was using the @Rule annotation in JUnit to let junit create a unique
temporary directory for the Cassandra instance. Once I had a temp dir I
then created the needed directories and used the Apache Velocity
templating engine to produce a storage-conf.xml with absolute paths to
the various directories for commit logs, data etc. once the tests are
done the framework takes care of cleaning up the files. this also
ensures that if I run several tests in parallell I get separate unique
temp directories for each instance. (I saw Ran Tavory had contributed a
DataCleaner class (or what it was named) to do something similar, but I
didn't want to use that since JUnit already has the needed mechanisms
for doing this. besides, I didn't like relying on a single testing
directory.

of course, reality came crashing in when I had more than one test and
thus more than one embedded Cassandra instance.  I tried to look for
quick solutions to this, but eventually flushed an entire week's work
down the toilet and left for vacation.

now we plan to take an inferior approach to the testing simply because
we've run out of time to get this done properly.  (In an ideal world I
would be able to sit down with the Cassandra code, rewrite the parts
that are "misbehaving" and work with someone to get the code reviewed).

okay, so what I would have wanted to do if I had the time:

  - go through the Cassandra code and remove singletons.

  - make Cassandra easier to embed by making starting and stopping work
    properly (for some reason that I have forgotten I had shutdown
    and/or timing issues.  for servers to be embeddable the start() and
    stop()/shutdown() methods need to block until some known state is
    reached.  (if shutdown() has to be slow because of work that needs
    to be done before safe shutdown it may be an idea to implement
    kill() for unsafe shutdown -- for instance when you know you will
    nuke the data anyway)

  - Remove dependence on config files.  It should be possible to
    just instantiate an embedded Cassandra server, pass it a config
    object and then start it without having to touch the filesystem or
    access any resource files for config. Depending on files or
    resources for config is bad. (However, there is nothing wrong with
    having a trivial API for reading files to produce a config object
    you can then pass into Cassandra).
    The detour I made into rendering an Apache Velocity template to
    produce a storage-conf.xml only to have my embedded Cassandra
    instance read it again was just silly.


there are other valid reasons for wanting to embed Cassandra besides
unit testing.  for instance, if you are writing an application that
depends on Cassandra and you want the option of packaging it as a single
binary for single node experimentation, development and demo purposes.  

as an example, I am currently working on a project where I have a server
that will be talking to a Cassandra cluster of half a dozen nodes. but
other development projects depend on this server, so they need some
quick way of getting it up and running on their own workstations and
laptops-- so they can start the server with a command line option that
says "use an embedded Cassandra server". of course, in unit tests they
also want to be able to embed my server and, of course, Cassandra.

I've done this a few times with Apache Derby -- to give users the option
of running with an embedded SQL server if they don't want the hassle of
setting up a MySQL instance, or fire up the application and have it talk
to a MySQL instance.


so in short:  yes, I am very, very interested in Cassandra being
embeddable, I am very interested in being able to have more than one
Cassandra instance in the same JVM and I am very interested in being
able to programmatically configuring Cassandra rather than messing with
config files.  :-)

sorry for not having more time to actually go and do these things rather
than whine about them.

-Bjørn


Mime
View raw message