incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Jones <MJo...@imagehawk.com>
Subject RE: Slow Responses from 2 of 3 nodes in RC1
Date Fri, 02 Apr 2010 20:39:02 GMT
To further complicate matters,
  when I read only from cassdb1, I can check about 100/second/thread (40 threads)
  when I read only from cassdb2, I can check about 50/second/thread (40 threads)
  when I read only from cassdb3, I can check about 50/second/thread (40 threads)

This is with a consistency level of ONE, ALL, or QUORUM....  All 3 levels return about the
same read rate (~5/second), yet 2 nodes return them at 1/2 speed of the other node.

I don't understand how this could be since QUORUM or ALL would require 2 of the 3 to respond
in ALL cases, so you would expect the read rate to the 50/second/thread or 100/second/thread,
regardless of who does the proxy.

-----Original Message-----
From: Mark Jones [mailto:MJones@imagehawk.com]
Sent: Friday, April 02, 2010 1:38 PM
To: user@cassandra.apache.org
Subject: Slow Responses from 2 of 3 nodes in RC1

I have a 3 node cassandra cluster I'm trying to work with:

All three machines are about the same:
6-8GB per machine  (fastest machine has 8GB, JavaVM limited to 5GB)
separate spindle for cassandra data and commit log

I wrote ~7 Million items to Cassandra, now, I'm trying to read them back, the ones that are
missing, might be troubling, but I'm not worried about that yet.  Part of the reason I only
have ~7 million items in, is that 2 of the nodes are NOT pulling their weight:


I've used "nodetool loadbalance" on them, to get the data evened out, it was terribly imbalanced
after ingestion, but it now looks like this:

Address       Status     Load          Range                                      Ring
                                       169214437894733073017295274330696200891
192.168.1.116 Up         1.88 GB       83372832363385696737577075791407985563     |<--|
    (cassdb2)
192.168.1.119 Up         2.59 GB       167732545381904888270252256443838855184    |   |  
  (cassdb3)
192.168.1.12  Up         2.5 GB        169214437894733073017295274330696200891    |-->|
    (cassdb1)

This is a summary report from my checking program(c++).  It runs one thread per file (files
contain the originally ingested data), checking to see if the data inserted is present and
the same as when it was inserted.  Each thread has its own thrift and Cassandra connection
setup. Connection point is randomly chosen at startup and that connection is reused by that
thread until the end of the test.  All the threads are running simultaneously and I would
expect similar results, but one node is beating the pants off the other two nodes for performance.

In the logs, there are nothing but INFO lines like these (there are others that give less
info about performance), no exceptions, warnings:
cassdb1:
INFO [COMPACTION-POOL:1] 2010-04-02 08:20:35,339 CompactionManager.java (line 326) Compacted
to /cassandra/data/bumble/Contacts-15-Data.db.  262279345/243198299 bytes for 324378 keys.
 Time: 16488ms.

cassdb2:
INFO [COMPACTION-POOL:1] 2010-04-02 08:20:16,448 CompactionManager.java (line 326) Compacted
to /cassandra/data/bumble/Contacts-5-Data.db.  251086153/234535924 bytes for 284088 keys.
 Time: 22805ms.

cassdb3:
INFO [COMPACTION-POOL:1] 2010-04-02 08:20:24,429 CompactionManager.java (line 326) Compacted
to /cassandra/data/bumble/Contacts-20-Data.db.  266451419/248084737 bytes for 347531 keys.
 Time: 25094ms.


How do I go about figuring out what is going on in this setup?

Iostat -x data is at the bottom

cassdb1 Checked:    9773 Good:    9770 Missing:       3 Miscompared:        0  '/tmp/QUECD0000000005
cassdb1 Checked:    9818 Good:    9817 Missing:       1 Miscompared:        0  '/tmp/QUEDE0000000005
cassdb1 Checked:    9820 Good:    9820 Missing:       0 Miscompared:        0  '/tmp/QUEQ0000000005
cassdb1 Checked:    9836 Good:    9836 Missing:       0 Miscompared:        0  '/tmp/QUEJ0000000005
cassdb1 Checked:    9843 Good:    9843 Missing:       0 Miscompared:        0  '/tmp/QUEFG0000000005
cassdb1 Checked:    9883 Good:    9883 Missing:       0 Miscompared:        0  '/tmp/QUENO0000000005
cassdb1 Checked:    9884 Good:    9883 Missing:       1 Miscompared:        0  '/tmp/QUEIJ0000000005
cassdb1 Checked:    9890 Good:    9890 Missing:       0 Miscompared:        0  '/tmp/QUER0000000005
cassdb1 Checked:    9915 Good:    9913 Missing:       2 Miscompared:        0  '/tmp/QUEMN0000000005
cassdb1 Checked:    9962 Good:    9962 Missing:       0 Miscompared:        0  '/tmp/QUEF0000000005
cassdb1 Checked:   10120 Good:   10120 Missing:       0 Miscompared:        0  '/tmp/QUEH0000000005
cassdb1 Checked:   10123 Good:   10123 Missing:       0 Miscompared:        0  '/tmp/QUEM0000000005
cassdb1 Checked:   10280 Good:   10280 Missing:       0 Miscompared:        0  '/tmp/QUEP0000000005
cassdb1 Checked:   10490 Good:   10490 Missing:       0 Miscompared:        0  '/tmp/QUEL0000000005

cassdb2 Checked:       0 Good:       0 Missing:       0 Miscompared:        0  '/tmp/QUEC0000000005
cassdb2 Checked:       1 Good:       1 Missing:       0 Miscompared:        0  '/tmp/QUEN0000000005
cassdb2 Checked:       2 Good:       2 Missing:       0 Miscompared:        0  '/tmp/QUEEF0000000005
cassdb2 Checked:       2 Good:       2 Missing:       0 Miscompared:        0  '/tmp/QUEG0000000005
cassdb2 Checked:       3 Good:       3 Missing:       0 Miscompared:        0  '/tmp/QUEV0000000005
cassdb2 Checked:       3 Good:       3 Missing:       0 Miscompared:        0  '/tmp/QUEX0000000005
cassdb2 Checked:       4 Good:       4 Missing:       0 Miscompared:        0  '/tmp/QUEB0000000005
cassdb2 Checked:       4 Good:       4 Missing:       0 Miscompared:        0  '/tmp/QUEBC0000000005
cassdb2 Checked:       5 Good:       5 Missing:       0 Miscompared:        0  '/tmp/QUEAB0000000005
cassdb2 Checked:       5 Good:       5 Missing:       0 Miscompared:        0  '/tmp/QUET0000000005
cassdb2 Checked:       6 Good:       6 Missing:       0 Miscompared:        0  '/tmp/QUEJK0000000005
cassdb2 Checked:       7 Good:       7 Missing:       0 Miscompared:        0  '/tmp/QUEO0000000005
cassdb2 Checked:       9 Good:       9 Missing:       0 Miscompared:        0  '/tmp/QUED0000000005

cassdb2 Checked:      10 Good:      10 Missing:       0 Miscompared:        0  '/tmp/QUEK0000000005
cassdb3 Checked:      13 Good:      13 Missing:       0 Miscompared:        0  '/tmp/QUEHI0000000005
cassdb3 Checked:      17 Good:      17 Missing:       0 Miscompared:        0  '/tmp/QUES0000000005
cassdb3 Checked:      18 Good:      18 Missing:       0 Miscompared:        0  '/tmp/QUEI0000000005
cassdb3 Checked:      19 Good:      19 Missing:       0 Miscompared:        0  '/tmp/QUEW0000000005
cassdb3 Checked:      20 Good:      20 Missing:       0 Miscompared:        0  '/tmp/QUEE0000000005
cassdb3 Checked:      20 Good:      20 Missing:       0 Miscompared:        0  '/tmp/QUEY0000000005
cassdb3 Checked:      21 Good:      21 Missing:       0 Miscompared:        0  '/tmp/QUEA0000000005
cassdb3 Checked:      21 Good:      21 Missing:       0 Miscompared:        0  '/tmp/QUELM0000000005
cassdb3 Checked:      21 Good:      21 Missing:       0 Miscompared:        0  '/tmp/QUEU0000000005
cassdb3 Checked:      23 Good:      23 Missing:       0 Miscompared:        0  '/tmp/QUEGH0000000005
cassdb3 Checked:      23 Good:      23 Missing:       0 Miscompared:        0  '/tmp/QUEKL0000000005
cassdb3 Checked:      23 Good:      23 Missing:       0 Miscompared:        0  '/tmp/QUEZ0000000005

cassdb1:
Linux 2.6.31-14-generic (record)        04/02/2010      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.81    0.09    2.23    0.36    0.00   90.51

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await
 svctm  %util
sda               0.27    25.61    0.89    0.77    27.80   210.90   143.64     0.05   27.35
  4.67   0.78
sdb               0.02     0.00    0.05    0.00     0.78     0.02    15.29     0.00   10.96
 10.95   0.06
sdc               0.02    35.43    0.01    0.75     0.15   718.84   947.70     0.28  371.59
  4.36   0.33

cassdb2:
Linux 2.6.31-14-generic (ec2)   04/02/2010      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.34    0.01    1.50    3.85    0.00   85.31

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await
 svctm  %util
sda               2.51     0.47    3.35    0.69   176.95    11.07    46.50     0.32   79.83
  5.22   2.11
sdb               0.06     0.00    0.02    0.00     0.60     0.00    32.17     0.00    2.83
  2.83   0.01
sdc               0.21     0.00   23.43    0.01  1581.80     1.74    67.55     0.45   19.18
  3.67   8.61

cassdb3:
Linux 2.6.31-14-generic (ec1)   04/02/2010      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.57    0.12    1.91    0.35    0.00   85.06

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await
 svctm  %util
sda               0.45    32.56    0.83    0.89    31.44   267.46   173.90     0.04   20.99
  4.45   0.77
sdb               0.02    37.61    1.35    0.79    26.30   674.95   327.98     0.28  133.03
  3.36   0.72

Mime
View raw message