cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Zlatanov <...@lifelogs.com>
Subject Re: writes to Cassandra failing occasionally
Date Thu, 08 Apr 2010 13:07:12 GMT
On Wed, 07 Apr 2010 13:19:26 -0700 Mike Gallamore <mike.e.gallamore@googlemail.com> wrote:


MG> As an aside I motified some other code to use Net::Cassandra instead
MG> of Net::Cassandra::Easy and noticed that it seems to run 3-4X
MG> slower. Both aren't stunningly fast. The test clients are running on
MG> the same machine as Cassandra, and I'm only getting somewhere between
MG> 100-400 (huge variance) with N::C::Easy and 30-90 with N::C. This test
MG> is writing key value pairs, with the keys being an incrementing
MG> numbber, and the values being a log line from one of our systems (~200
MG> character string). I'm surprised there is such a huge difference in
MG> speed between the two modules and that the transactions per second are
MG> so low even on my 3.2Ghz P4 2GB RAM box. I tried dropping the
MG> consistency level down to zero but it had a negligible affect.

First of all, Thrift and the way it's implemented in pure Perl
(Inline::C or XS would have been much better, plus the data structures
are horrible) are IMO the most annoying thing about working with
Cassandra.  I proposed a pluggable API mechanism so users don't have to
depend on Thrift but the proposal was rejected, so for now Thrift (with
the crash-on-demand feature) is the only actively developed Cassandra
API.  Avro is supposed to be happening soon and I look forward to that.

You should benchmark your code; make sure you're comparing apples to
apples.  N::C::Easy wraps the operations for you, always using multigets
and mutations on the backend.  I don't know how your Net::Cassandra test
is implemented.  It may be you're making multiple requests when you only
need one.  But more importantly, unless you fork multiple processes you
won't be winning any speed races.  Use Tie::ShareLite, for example, to
synchronize your data structures through shared memory.

If you can put together benchmarks that run against the default
(Keyspace1) configuration, I can try to optimize things.  I won't be
rewriting the Thrift side, so it will still be slow on
serialize/deserialize operations, but everything else will be fixed if
it's suboptimal.

Ted


Mime
View raw message