cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Time to insert bulk data is very high comparing to database
Date Sun, 08 Nov 2009 20:30:37 GMT
On Sun, Nov 8, 2009 at 11:47 AM, Richard grossman <> wrote:
> Sorry I can't find the post talking about that I can't open this link on mac
> os

alternative link:

> You mean by parallel to write a code running the insert into thread instead
> of one by one ?


> If it's the case is the Thrift API are thread safe ?. Ho do you manage the
> opening and the close of the connection ? like single thread open one and
> closed at the end.

you need one conn per thread.

> I've made modification like this ... Is it what you think about?


> Anyway I've opened a new small instance in amazon to run the insert not one
> running cassandra and give one of the cassandra server ip. It's not improve
> nothing. The client machine is 1% CPU the server machines are 1% CPU.

right.  that's because you only have one client thread.

> The problem come when the data is distributed between the 2 cassandra
> servers because all the time the data go to commitlog of the first server
> all is ok ~2000 rows/second. But when the data goes to the second server
> it's falling very sharply ~200 rows /second.

because you have the extra latency of forwarding the right to the
second machine.

but since cassandra itself is concurrent throughput will go up
significantly as you add clients threads.

> I've read that I can check latency with JMX. it's ok but I can't succed to
> connect JMX agent on amazon the params are OK but nothing help the jconsole
> on my side refuse to connect. Is there something else I can check ?

dunno, you sure it's not firewalled?


View raw message