Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 69595 invoked from network); 18 Aug 2010 15:08:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Aug 2010 15:08:39 -0000 Received: (qmail 82255 invoked by uid 500); 18 Aug 2010 15:08:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82216 invoked by uid 500); 18 Aug 2010 15:08:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82208 invoked by uid 99); 18 Aug 2010 15:08:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 15:08:36 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 15:08:32 +0000 Received: by bwz9 with SMTP id 9so766888bwz.31 for ; Wed, 18 Aug 2010 08:08:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=WdwOIJ12IjFV6uyNJnYVcLTdkft1qtFGzO8QjkEjYVU=; b=ZY9XqYjlt0rrXUbNgchGCXfSo+18vxdsDYN27XQa9s+Mh7MavRHneec+jiuYnM4CZJ NhzzyqKnTWziTUi+J7MYBMukcxvn2im0DWMl05gMcaP6/yDbJkykIPrkMTVJayg+UnM6 Y50qpcppWc0FuVodDqVyB/mz/f9CUF0JJf4HI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=LAH+fhxtMMdurnNJNCcZClB0pjGHobFsn9TUmO39Pfk4bjSDt9VQHAV05j7E1CIg5C q4GqcItwVAPmUSUEtw/GwhZ9o/P55Yr971Ay5AHJwO/cczyjr4jvfMhoEVFviaNxFcde JMBxC0ziruSnh6fJZtM3HdpCkIh8dfLAcu1mM= MIME-Version: 1.0 Received: by 10.204.15.148 with SMTP id k20mr5583814bka.74.1282144090395; Wed, 18 Aug 2010 08:08:10 -0700 (PDT) Received: by 10.204.62.84 with HTTP; Wed, 18 Aug 2010 08:08:10 -0700 (PDT) In-Reply-To: References: <4C5C8370.7070107@digg.com> Date: Wed, 18 Aug 2010 11:08:10 -0400 Message-ID: Subject: Re: Cassandra disk space utilization WAY higher than I would expect From: Edward Capriolo To: user@cassandra.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On Wed, Aug 18, 2010 at 10:51 AM, Jonathan Ellis wrote: > If you read the stack traces you pasted, the node in question ran out > of diskspace. =A0When you have < 25% space free this is not surprising. > > But fundamentally you are missing something important from your story > here. =A0Disk space doesn't just increase spontaneously with "absolutely > no activity." > > On Wed, Aug 18, 2010 at 9:36 AM, Julie wrot= e: >> >> Rob Coli digg.com> writes: >> >> >>> As I understand Julie's case, she is : >>> a) initializing her cluster >>> b) inserting some number of unique keys with CL.ALL >>> c) noticing that more disk space (6x?) than is expected is used >>> d) but that she gets expected usage if she does a major compaction >>> In other words, the problem isn't "temporary disk space occupied during >>> the compact", it's permanent disk space occupied unless she compacts. >>> >>> Julie : when compaction occurs, it logs the number of bytes that it >>> started with and the number it ended with, as well as the number of key= s >>> involved in the compaction. What do these messages say? >>> >>> example line : >>> INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java >>> (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. >>> 999999999/888888888 bytes for 12345678 keys. =A0Time: 123456ms. >> >> Rob - >> I reran the original test: 8 nodes in the cluster (160GB drives on each >> node). Populated each node with 30GB of data using unique keys and CL.AL= L >> and repFactor=3D3. =A0(Wrote 10GB of data to each node but with the repF= actor=3D3, >> it results in about 30GB of data on each node.) >> >> One hour after the last write, the ring distribution looks excellent: >> Address =A0 =A0 =A0 Status =A0 =A0 Load =A0 =A0 =A0 =A0 =A0Range >> =A0Ring >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 170141183460469231731687303715884105728 >> 10.210.198.64 Up =A0 =A0 =A0 =A0 28.32 GB =A0 =A0 =A02126764793255865396= 6460912964485513216 >> =A0|<--| >> 10.210.157.187Up =A0 =A0 =A0 =A0 28.07 GB =A0 =A0 =A04253529586511730793= 2921825928971026432 >> =A0| =A0 ^ >> 10.206.34.194 Up =A0 =A0 =A0 =A0 28.12 GB =A0 =A0 =A06380294379767596189= 9382738893456539648 >> =A0v =A0 | >> 10.254.107.178Up =A0 =A0 =A0 =A0 28.15 GB =A0 =A0 =A08507059173023461586= 5843651857942052864 >> =A0| =A0 ^ >> 10.254.234.226Up =A0 =A0 =A0 =A0 28.02 GB =A0 =A0 =A01063382396627932698= 32304564822427566080 >> =A0v =A0 | >> 10.254.242.159Up =A0 =A0 =A0 =A0 27.96 GB =A0 =A0 =A01276058875953519237= 98765477786913079296 >> =A0| =A0 ^ >> 10.214.18.198 Up =A0 =A0 =A0 =A0 28.18 GB =A0 =A0 =A01488735355279105777= 65226390751398592512 >> =A0v =A0 | >> 10.214.26.118 Up =A0 =A0 =A0 =A0 29.82 GB =A0 =A0 =A01701411834604692317= 31687303715884105728 >> =A0|-->| >> >> But 8 hours after the last write (absolutely no activity), things don't >> look as good: >> >> Address =A0 =A0 =A0 Status =A0 =A0 Load =A0 =A0 =A0 =A0 =A0Range >> =A0Ring >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 170141183460469231731687303715884105728 >> 10.210.198.64 Up =A0 =A0 =A0 =A0 30.28 GB =A0 =A0 =A02126764793255865396= 6460912964485513216 >> =A0|<--| >> 10.210.157.187Up =A0 =A0 =A0 =A0 28.12 GB =A0 =A0 =A04253529586511730793= 2921825928971026432 >> =A0| =A0 ^ >> 10.206.34.194 Up =A0 =A0 =A0 =A0 122.41 GB =A0 =A0 638029437976759618993= 82738893456539648 >> =A0v =A0 | >> 10.254.107.178Up =A0 =A0 =A0 =A0 33.89 GB =A0 =A0 =A08507059173023461586= 5843651857942052864 >> =A0| =A0 ^ >> 10.254.234.226Up =A0 =A0 =A0 =A0 28.01 GB =A0 =A0 =A01063382396627932698= 32304564822427566080 >> =A0v =A0 | >> 10.254.242.159Up =A0 =A0 =A0 =A0 72.58 GB =A0 =A0 =A01276058875953519237= 98765477786913079296 >> =A0| =A0 ^ >> 10.214.18.198 Up =A0 =A0 =A0 =A0 83.41 GB =A0 =A0 =A01488735355279105777= 65226390751398592512 >> =A0v =A0 | >> 10.214.26.118 Up =A0 =A0 =A0 =A0 62.01 GB =A0 =A0 =A01701411834604692317= 31687303715884105728 >> =A0|-->| >> >> The 122.41GB node is named ec2-server3. =A0Here's what cfstats reports: >> On ec2-server3: >> >> =A0 =A0 =A0 =A0 =A0Write Latency: 0.21970486121446028 ms. >> =A0 =A0 =A0 =A0 =A0Pending Tasks: 0 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Column Family: Standard1 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SSTable count: 9 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Space used (live): 131438= 293207 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Space used (total): 14357= 7216419 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 454 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 0 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: NaN ms. >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Count: 302373 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Latency: 0.220 ms. >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Pending Tasks: 0 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache capacity: 200000 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 0 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache hit rate: NaN >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache: disabled >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 100316 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 100323 >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 100322 >> >> On ec2-server3, df reports: >> /dev/sdb =A0 =A0 =A0 =A0 =A0 =A0 153899044 140784388 =A0 5297032 =A097% = /mnt >> >> So this node should (I would think) contain 30GB of data on a 160GB hard >> drive but somehow it has grown to 122 GB of data (live space) plus sever= al >> compacted files that have not yet been deleted (total space). =A0Keep in >> mind that no deletions or updates have taken place, just unique key writ= es. >> >> At this point, here are the compaction lines from the ec2-server3 system= .log >> file, taking place after all writes had completed: >> >> INFO [COMPACTION-POOL:1] 2010-08-16 14:11:58,614 CompactionManager.java = (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-465-Data.d= b. >> 8595448738/8595448738 bytes for 85678 keys. =A0Time: 984905ms. >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 14:20:02,825 CompactionManager.ja= va (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-466-Data.d= b. >> 2157339354/2157339354 bytes for 21504 keys. =A0Time: 484188ms. >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 14:28:37,066 CompactionManager.ja= va (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-468-Data.d= b. >> 2157339329/2157339329 bytes for 21504 keys. =A0Time: 514226ms. >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 14:36:04,806 CompactionManager.ja= va (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-469-Data.d= b. >> 2157339417/2157339417 bytes for 21504 keys. =A0Time: 447720ms. >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 14:40:31,234 CompactionManager.ja= va (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-470-Data.d= b. >> 2157339202/2157339202 bytes for 21504 keys. =A0Time: 266418ms. >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 14:58:48,943 CompactionManager.ja= va (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-471-Data.d= b. >> 8629357302/8629357302 bytes for 86016 keys. =A0Time: 1097659ms. >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 15:02:43,428 CompactionManager.ja= va (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-473-Data.d= b. >> 2157338687/2157338687 bytes for 21504 keys. =A0Time: 234464ms. >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 15:05:08,807 CompactionManager.ja= va (line >> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-474-Data.d= b. >> 1345327705/1345327705 bytes for 13410 keys. =A0Time: 145363ms. >> >> Here=92s what=92s in the data directory on this node: >> >> -rw-r--r-- 1 root root =A09518412981 Aug 16 13:36 Standard1-247-Data.db >> -rw-r--r-- 1 root root =A08595448738 Aug 16 14:11 Standard1-465-Data.db >> -rw-r--r-- 1 root root 22968417153 Aug 16 14:34 Standard1-467-Data.db >> -rw-r--r-- 1 root root =A08629357302 Aug 16 14:58 Standard1-471-Data.db >> -rw-r--r-- 1 root root =A02157338687 Aug 16 15:02 Standard1-473-Data.db >> -rw-r--r-- 1 root root =A01345327705 Aug 16 15:05 Standard1-474-Data.db >> -rw-r--r-- 1 root root 28995571589 Aug 16 15:38 Standard1-475-Data.db >> -rw-r--r-- 1 root root 30063580839 Aug 16 17:26 Standard1-476-Data.db >> -rw-r--r-- 1 root root 19091100731 Aug 16 17:24 Standard1-477-Data.db >> Plus a bunch of compacted files. >> >> At this point, I performed a manual "nodetool cleanup" on the super larg= e >> node when *very bad things* happened. =A0Here's an excerpt from the larg= e >> node's log file upon issuing the cleanup: >> >> INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 CompactionManager.java = (line >> 345) AntiCompacting [org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-477-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-475-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-467-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-247-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-476-Data.db'), >> org.apache.cassandra.io.SSTableReader( >> path=3D'/var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db')] >> >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 StorageService.java >> (line 1499) requesting GC to free disk space >> =A0INFO [GC inspection] 2010-08-16 23:04:04,470 GCInspector.java (line 1= 10) >> GC for ConcurrentMarkSweep: 233 ms, 299442152 reclaimed leaving 20577560 >> used; max is 1172766720 >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:14,641 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-419-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:17,974 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-466-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,100 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-428-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,225 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-415-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,374 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-441-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,514 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-461-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,623 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-442-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,740 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-425-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,891 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-447-Data.db >> =85 >> INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,331 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-451-Data.db >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,423 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-450-Data.db >> ERROR [COMPACTION-POOL:1] 2010-08-16 23:04:25,525 >> DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask >> java.util.concurrent.ExecutionException: >> java.lang.UnsupportedOperationException: disk full >> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask$Sync.innerGet(FutureTa= sk.java:222) >> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.get(FutureTask.java:83= ) >> =A0 =A0 =A0 =A0at org.apache.cassandra.concurrent.DebuggableThreadPoolEx= ecutor.after >> Execute (DebuggableThreadPoolExecutor.java:86) >> =A0 =A0 =A0 =A0at >> org.apache.cassandra.db.CompactionManager$CompactionExecutor.after >> Execute (CompactionManager.java:582) >> =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.runTask >> (ThreadPoolExecutor.java:888) >> =A0 =A0 =A0 =A0at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:908) >> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:619) >> Caused by: java.lang.UnsupportedOperationException: disk full >> =A0 =A0 =A0 =A0at >> org.apache.cassandra.db.CompactionManager.doAntiCompaction >> (CompactionManager.java:351) >> =A0 =A0 =A0 =A0at >> org.apache.cassandra.db.CompactionManager.doCleanupCompaction >> (CompactionManager.java:417) >> =A0 =A0 =A0 =A0at >> org.apache.cassandra.db.CompactionManager.access$400 >> (CompactionManager.java:49) >> =A0 =A0 =A0 =A0at >> org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:= 130) >> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask$Sync.innerRun(FutureTa= sk.java:303) >> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:13= 8) >> =A0 =A0 =A0 =A0at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask >> (ThreadPoolExecutor.java:886) >> =A0 =A0 =A0 =A0... 2 more >> =A0INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,558 >> SSTableDeletingReference.java (line 104) >> Deleted /var/lib/cassandra/data/Keyspace1/Standard1-449-Data.db >> =85 >> INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:33,265 >> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra >> /data/Keyspace1/Standard1-453-Data.db >> =A0INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.ja= va >> (line 246) Compacting [] >> >> The next morning (19 hours later): >> >> INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java >> (line 246) Compacting [] >> =A0INFO [COMPACTION-POOL:1] 2010-08-17 00:05:08,637 CompactionManager.ja= va >> (line 246) Compacting [] >> =A0INFO [COMPACTION-POOL:1] 2010-08-17 01:05:08,607 CompactionManager.ja= va >> (line 246) Compacting [] >> =A0INFO [COMPACTION-POOL:1] 2010-08-17 02:05:08,581 CompactionManager.ja= va >> (line 246) Compacting [] >> =A0INFO [COMPACTION-POOL:1] 2010-08-17 03:05:08,568 CompactionManager.ja= va >> (line 246) Compacting [] >> =A0INFO [COMPACTION-POOL:1] 2010-08-17 04:05:08,532 CompactionManager.ja= va >> (line 246) Compacting [] >> =A0INFO [COMPACTION-POOL:1] 2010-08-17 05:05:08,505 CompactionManager.ja= va >> (line 246) Compacting [] >> =A0INFO [COMPACTION-POOL:1] 2010-08-17 06:05:08,494 CompactionManager.ja= va >> (line 246) Compacting [] >> >> Also, 19 hours later, ring distribution is the same: >> >> Address =A0 =A0 =A0 Status =A0 =A0 Load =A0 =A0 =A0 =A0 =A0Range >> =A0Ring >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 170141183460469231731687303715884105728 >> 10.210.198.64 Up =A0 =A0 =A0 =A0 30.28 GB =A0 =A0 =A02126764793255865396= 6460912964485513216 >> =A0|<--| >> 10.210.157.187Up =A0 =A0 =A0 =A0 28.12 GB =A0 =A0 =A04253529586511730793= 2921825928971026432 >> =A0| =A0 ^ >> 10.206.34.194 Up =A0 =A0 =A0 =A0 122.41 GB =A0 =A0 638029437976759618993= 82738893456539648 >> =A0v =A0 | >> 10.254.107.178Up =A0 =A0 =A0 =A0 33.89 GB =A0 =A0 =A08507059173023461586= 5843651857942052864 >> =A0| =A0 ^ >> 10.254.234.226Up =A0 =A0 =A0 =A0 28.01 GB =A0 =A0 =A01063382396627932698= 32304564822427566080 >> =A0v =A0 | >> 10.254.242.159Up =A0 =A0 =A0 =A0 72.58 GB =A0 =A0 =A01276058875953519237= 98765477786913079296 >> =A0| =A0 ^ >> 10.214.18.198 Up =A0 =A0 =A0 =A0 83.41 GB =A0 =A0 =A01488735355279105777= 65226390751398592512 >> =A0v =A0 | >> 10.214.26.118 Up =A0 =A0 =A0 =A0 62.01 GB =A0 =A0 =A01701411834604692317= 31687303715884105728 >> =A0|-->| >> >> So nodetool cleanup seems to have resulted in a fatal error on my overly= (>4x) >> bloated node. >> >> Can anyone help me with understanding why this happened, taking into acc= ount >> the node should only contain 30GB of data on a 160GB hard drive? =A0I ha= ve the 8 >> system.log files from the 24 hour period (the nodes were only alive for = 24 >> hours total). >> >> Thank you! >> Julie >> >> >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > I noticed something funny about repair as well. I upgraded to 6.3 then joined a node with bootstrap set to false. So the node was empty. I ran REPAIR on the new node. Later I noticed that the node had 2-4x the data it should have had. Triggering REPAIR should cause anti compaction on the closest nodes on the ring. Those nodes should then stream ONLY the data that belongs on new node to the new node. Right? By the size of the new node I assumed that all the contents of both nodes were streamed to this node. (This could be wrong though) Sorry I have no good test case to replicate this.