incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Gallamore <mike.e.gallam...@googlemail.com>
Subject Re: writes to Cassandra failing occasionally
Date Thu, 08 Apr 2010 18:50:38 GMT
I'll work on making a benchmark sometime latter. But I don't think that 
my changes would be batched. My rows only have one column and for this 
test each row is only accessed once (when it is written), I pretty much 
directly mapped over from a key value store that was using memcache before.

It is something like:

key(5-10 char)       log_line (~130 char) eg.

test-1    [2010-04-08 14:46:19 -0400] [6704] tc=4.3.48 
i=119.93.29.92:4817 h=[UNAVAILABLE] o=N t= u= a= m= r= ir= l=CONN c=550 
z="Connection refused" id=74a26fe4 q= x="RBL 
action=reject;zen.dnsbl=127.0.0.4.reject;throttle.bobsbolts.com=127.0.0.99.throttle;t=0,0"

n=82/55/1 f=24 p=0 s=""   d=0.00
test-2    [2010-04-08 14:46:19 -0400] [6700] tc=4.3.48 
i=204.16.8.227:29873 h=[UNAVAILABLE] o=N t= u= a= m= r= ir= l=CONN c=550 
z="Connection refused" id=4e2c43a6 q= x="RBL 
action=reject;zen.dnsbl=127.0.0.4.reject;zen.dnsbl=127.0.0.11.reject;throttle.bobsbolts.com=127.0.0.99.throttle;t=0,0"

n=154/115/1 f=24 p=0 s=""   d=0.03

Yes I agree single threaded is probably not the best. I wonder how much 
of a performance hit it is on a single CPU machine though? I guess I 
still would be blocking on ram writes but isn't like there is multiple 
CPUs I need to keep busy or anything.
On 04/08/2010 06:07 AM, Ted Zlatanov wrote:
> On Wed, 07 Apr 2010 13:19:26 -0700 Mike Gallamore<mike.e.gallamore@googlemail.com>
 wrote:
>
> MG>  As an aside I motified some other code to use Net::Cassandra instead
> MG>  of Net::Cassandra::Easy and noticed that it seems to run 3-4X
> MG>  slower. Both aren't stunningly fast. The test clients are running on
> MG>  the same machine as Cassandra, and I'm only getting somewhere between
> MG>  100-400 (huge variance) with N::C::Easy and 30-90 with N::C. This test
> MG>  is writing key value pairs, with the keys being an incrementing
> MG>  numbber, and the values being a log line from one of our systems (~200
> MG>  character string). I'm surprised there is such a huge difference in
> MG>  speed between the two modules and that the transactions per second are
> MG>  so low even on my 3.2Ghz P4 2GB RAM box. I tried dropping the
> MG>  consistency level down to zero but it had a negligible affect.
>
> First of all, Thrift and the way it's implemented in pure Perl
> (Inline::C or XS would have been much better, plus the data structures
> are horrible) are IMO the most annoying thing about working with
> Cassandra.  I proposed a pluggable API mechanism so users don't have to
> depend on Thrift but the proposal was rejected, so for now Thrift (with
> the crash-on-demand feature) is the only actively developed Cassandra
> API.  Avro is supposed to be happening soon and I look forward to that.
>
> You should benchmark your code; make sure you're comparing apples to
> apples.  N::C::Easy wraps the operations for you, always using multigets
> and mutations on the backend.  I don't know how your Net::Cassandra test
> is implemented.  It may be you're making multiple requests when you only
> need one.  But more importantly, unless you fork multiple processes you
> won't be winning any speed races.  Use Tie::ShareLite, for example, to
> synchronize your data structures through shared memory.
>
> If you can put together benchmarks that run against the default
> (Keyspace1) configuration, I can try to optimize things.  I won't be
> rewriting the Thrift side, so it will still be slow on
> serialize/deserialize operations, but everything else will be fixed if
> it's suboptimal.
>
> Ted
>
>    


Mime
View raw message