incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Time to insert bulk data is very high comparing to database
Date Sun, 08 Nov 2009 20:30:37 GMT
On Sun, Nov 8, 2009 at 11:47 AM, Richard grossman <richiesgr@gmail.com> wrote:
>> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200910.mbox
>
> Sorry I can't find the post talking about that I can't open this link on mac
> os

alternative link: http://markmail.org/message/tucee5bqiuz2og4z

> You mean by parallel to write a code running the insert into thread instead
> of one by one ?

yes.

> If it's the case is the Thrift API are thread safe ?. Ho do you manage the
> opening and the close of the connection ? like single thread open one and
> closed at the end.

you need one conn per thread.

> I've made modification like this ... Is it what you think about?

yes.

> Anyway I've opened a new small instance in amazon to run the insert not one
> running cassandra and give one of the cassandra server ip. It's not improve
> nothing. The client machine is 1% CPU the server machines are 1% CPU.

right.  that's because you only have one client thread.

> The problem come when the data is distributed between the 2 cassandra
> servers because all the time the data go to commitlog of the first server
> all is ok ~2000 rows/second. But when the data goes to the second server
> it's falling very sharply ~200 rows /second.

because you have the extra latency of forwarding the right to the
second machine.

but since cassandra itself is concurrent throughput will go up
significantly as you add clients threads.

> I've read that I can check latency with JMX. it's ok but I can't succed to
> connect JMX agent on amazon the params are OK but nothing help the jconsole
> on my side refuse to connect. Is there something else I can check ?

dunno, you sure it's not firewalled?

-Jonathan

Mime
View raw message