incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Desimpel, Ignace" <Ignace.Desim...@nuance.com>
Subject RE: FW: Very slow batch insert using version 0.7.2
Date Fri, 11 Mar 2011 08:48:49 GMT
That is the amount of records I need to add for each document. And we would like to test it
with more than 100K or more documents. That's why we thought Cassandra could be a good database
system.

At start I did the inserts one by one. Of course by doing it in batch the system was a lot
faster, and it worked fine in version 0.6.x.
With your question in mind, I did some more tests (only on Windows XP):
1) Changed the code to insert two sets of about 50K. Same behavior in 0.7.x.
2) Then changed it to store 1000 records at a time. Seems a bit better. Now the rpc timeout
is not throwed. But the number of flushing of Memtables and the number of generate commit
logs is still large. And the total amount of time to write is still more than 10 minutes,
although is used to be less than 10 seconds.


I do not know the code of Cassandra, but I also have the system running in Eclipse. Thus if
needed I can debug the code but I would need some input from your team.

Ignace


-----Original Message-----
From: Ryan King [mailto:ryan@twitter.com] 
Sent: donderdag 10 maart 2011 18:18
To: user@cassandra.apache.org
Cc: Desimpel, Ignace
Subject: Re: FW: Very slow batch insert using version 0.7.2

Why use such a large batch size?

-ryan

On Thu, Mar 10, 2011 at 6:31 AM, Desimpel, Ignace
<Ignace.Desimpel@nuance.com> wrote:
>
>
> Hello,
>
> I had a demo application with embedded cassandra version 0.6.x, inserting
> about 120 K  row mutations in one call.
>
> In version 0.6.x that usually took about 5 seconds, and I could repeat this
> step adding each time the same amount of data.
>
> Running on a single CPU computer, single hard disk, XP 32 bit OS, 1G memory
>
> I tested this again on CentOS 64 bit OS, 6G memory, different settings of
> memtable_throughput_in_mb and memtable_operations_in_millions.
>
> Also tried version 0.7.3. Also the same behavior.
>
>
>
> Now with version 0.7.2 the call returns with a timeout exception even using
> a timeout of 120000 (2 minutes). I see the CPU time going to 100%, a lot of
> disk writing ( giga bytes), a lot of log messages  about compacting,
> flushing, commitlog, ...
>
>
>
> Below you can find some information using the nodetool at start of the batch
> mutation and also after 14 minutes. The MutationStage is clearly showing how
> slow the system handles the row mutations.
>
>
>
> Attached : Cassandra.yaml with at end the description of my database
> structure using yaml
>
> Attached : log file with cassandra output.
>
>
>
> Any idea what I could be doing wrong?
>
>
>
> Regards,
>
>
>
> Ignace Desimpel
>
>
>
> ignace.desimpel@nuance.com
>
>
>
> At start of the insert (after inserting 124360 row mutations) I get the
> following info from the nodetool :
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info
>
> Starting NodeTool
>
> 34035877798200531112672274220979640561
>
> Gossip active    : true
>
> Load             : 5.49 MB
>
> Generation No    : 1299502115
>
> Uptime (seconds) : 1152
>
> Heap Memory (MB) : 179,84 / 1196,81
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats
>
> Starting NodeTool
>
> Pool Name                    Active   Pending      Completed
>
> ReadStage                         0         0         
40637
>
> RequestResponseStage              0         0            
30
>
> MutationStage                    32    121679         
72149
>
> GossipStage                       0         0             
0
>
> AntiEntropyStage                  0         0             
0
>
> MigrationStage                    0         0             
1
>
> MemtablePostFlusher               0         0             
6
>
> StreamStage                       0         0             
0
>
> FlushWriter                       0         0             
5
>
> MiscStage                         0         0             
0
>
> FlushSorter                       0         0             
0
>
> InternalResponseStage             0         0             
0
>
> HintedHandoff                     0         0             
0
>
>
>
> After 14 minutes (timeout exception after 2 minutes : see log file) I get :
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info
>
> Starting NodeTool
>
> 34035877798200531112672274220979640561
>
> Gossip active    : true
>
> Load             : 10.31 MB
>
> Generation No    : 1299502115
>
> Uptime (seconds) : 2172
>
> Heap Memory (MB) : 733,82 / 1196,81
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats
>
> Starting NodeTool
>
> Pool Name                    Active   Pending      Completed
>
> ReadStage                         0         0         
40646
>
> RequestResponseStage              0         0            
30
>
> MutationStage                    32    103310         
90526
>
> GossipStage                       0         0             
0
>
> AntiEntropyStage                  0         0             
0
>
> MigrationStage                    0         0             
1
>
> MemtablePostFlusher               0         0            
69
>
> StreamStage                       0         0             
0
>
> FlushWriter                       0         0            
68
>
> FILEUTILS-DELETE-POOL             0         0            
42
>
> MiscStage                         0         0             
0
>
> FlushSorter                       0         0             
0
>
> InternalResponseStage             0         0             
0
>
> HintedHandoff                     0         0             
0
>
>
>
>

Mime
View raw message