We're also seeing something similar since upgrading to 1.0.0.

We have a 6-node cluster with replication factor of 3, but three of the nodes are older running 32-bit Windows Server 2008, and three of the nodes are newer and running 64-bit Windows Server 2008 R2, and we're running 32-bit java on the older nodes and 64-bit java on the newer nodes. We are *not* using compression and we are *not* using leveled compaction, and we also see that nodetool ring and info report the wrong load, it's growing faster than actual disk usage. Restarting a node restores the reported load to the correct number.

However, this only happens on the newer nodes running 64-bit java, not on the older nodes running 32-bit.

Nodetool ring reports:
10.0.0.57       datacenter1 rack1       Up     Normal  25.7 GB         16.67%
10.0.0.50       datacenter1 rack1       Up     Normal  12.34 GB        16.67%
10.0.0.58       datacenter1 rack1       Up     Normal  11.74 GB        16.67%
10.0.0.51       datacenter1 rack1       Up     Normal  12.25 GB        16.67%
10.0.0.56       datacenter1 rack1       Up     Normal  17.94 GB        16.67%
10.0.0.52       datacenter1 rack1       Up     Normal  12.56 GB        16.67%

.56, .57, .58 are the newer nodes, I restarted .58, and then it reports the correct size, while .57 and .56 report the wrong size. This is after about a week of uptime for all nodes, and the bug makes the newer nodes report about twice the actual datasize.

Running compaction does not correct the reported load number, only restarting Cassandra fixes it.

I hope this helps a little bit at least.


/Henrik Schröder

On Thu, Oct 20, 2011 at 18:53, Dan Hendry <dan.hendry.junk@gmail.com> wrote:

I have been playing around with Cassandra 1.0.0 in our test environment it seems pretty sweet so far. I have however come across what appears to be a bug tracking node load. I have enabled compression and levelled compaction on all CFs (scrub  + snapshot deletion) and the nodes have been operating normally for a day or two. I started getting concerned when the load as reported by nodetool ring kept increasing (it seems monotonically) despite seeing a compression ratio of ~2.5x (as a side note, I find it strange Cassandra does not provide the compression ratio via jmx or in the logs). I initially thought there might be a bug in cleaning up obsolete SSTables but I then noticed the following discrepancy:

 

Nodetool ring reports:

                10.112.27.65    datacenter1 rack1       Up     Normal  8.64 GB         50.00%  170141183460469231731687303715884105727

 

Yet du . –h reports: only 2.4G in the data directory.

 

After restarting the node, nodetool ring reports a more accurate:

10.112.27.65    datacenter1 rack1       Up     Normal  2.35 GB         50.00%  170141183460469231731687303715884105727

 

Again, both compression and levelled compaction have been enabled on all CFs. Is this a known issue or has anybody else observed a similar pattern?

 

Dan Hendry

(403) 660-2297