cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Cassandra disk space utilization WAY higher than I would expect
Date Wed, 18 Aug 2010 15:08:10 GMT
On Wed, Aug 18, 2010 at 10:51 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> If you read the stack traces you pasted, the node in question ran out
> of diskspace.  When you have < 25% space free this is not surprising.
>
> But fundamentally you are missing something important from your story
> here.  Disk space doesn't just increase spontaneously with "absolutely
> no activity."
>
> On Wed, Aug 18, 2010 at 9:36 AM, Julie <julie.sugar@nextcentury.com> wrote:
>>
>> Rob Coli <rcoli <at> digg.com> writes:
>>
>>
>>> As I understand Julie's case, she is :
>>> a) initializing her cluster
>>> b) inserting some number of unique keys with CL.ALL
>>> c) noticing that more disk space (6x?) than is expected is used
>>> d) but that she gets expected usage if she does a major compaction
>>> In other words, the problem isn't "temporary disk space occupied during
>>> the compact", it's permanent disk space occupied unless she compacts.
>>>
>>> Julie : when compaction occurs, it logs the number of bytes that it
>>> started with and the number it ended with, as well as the number of keys
>>> involved in the compaction. What do these messages say?
>>>
>>> example line :
>>> INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java
>>> (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db.
>>> 999999999/888888888 bytes for 12345678 keys.  Time: 123456ms.
>>
>> Rob -
>> I reran the original test: 8 nodes in the cluster (160GB drives on each
>> node). Populated each node with 30GB of data using unique keys and CL.ALL
>> and repFactor=3.  (Wrote 10GB of data to each node but with the repFactor=3,
>> it results in about 30GB of data on each node.)
>>
>> One hour after the last write, the ring distribution looks excellent:
>> Address       Status     Load          Range
>>  Ring
>>                                       170141183460469231731687303715884105728
>> 10.210.198.64 Up         28.32 GB      21267647932558653966460912964485513216
>>  |<--|
>> 10.210.157.187Up         28.07 GB      42535295865117307932921825928971026432
>>  |   ^
>> 10.206.34.194 Up         28.12 GB      63802943797675961899382738893456539648
>>  v   |
>> 10.254.107.178Up         28.15 GB      85070591730234615865843651857942052864
>>  |   ^
>> 10.254.234.226Up         28.02 GB      106338239662793269832304564822427566080
>>  v   |
>> 10.254.242.159Up         27.96 GB      127605887595351923798765477786913079296
>>  |   ^
>> 10.214.18.198 Up         28.18 GB      148873535527910577765226390751398592512
>>  v   |
>> 10.214.26.118 Up         29.82 GB      170141183460469231731687303715884105728
>>  |-->|
>>
>> But 8 hours after the last write (absolutely no activity), things don't
>> look as good:
>>
>> Address       Status     Load          Range
>>  Ring
>>                                       170141183460469231731687303715884105728
>> 10.210.198.64 Up         30.28 GB      21267647932558653966460912964485513216
>>  |<--|
>> 10.210.157.187Up         28.12 GB      42535295865117307932921825928971026432
>>  |   ^
>> 10.206.34.194 Up         122.41 GB     63802943797675961899382738893456539648
>>  v   |
>> 10.254.107.178Up         33.89 GB      85070591730234615865843651857942052864
>>  |   ^
>> 10.254.234.226Up         28.01 GB      106338239662793269832304564822427566080
>>  v   |
>> 10.254.242.159Up         72.58 GB      127605887595351923798765477786913079296
>>  |   ^
>> 10.214.18.198 Up         83.41 GB      148873535527910577765226390751398592512
>>  v   |
>> 10.214.26.118 Up         62.01 GB      170141183460469231731687303715884105728
>>  |-->|
>>
>> The 122.41GB node is named ec2-server3.  Here's what cfstats reports:
>> On ec2-server3:
>>
>>          Write Latency: 0.21970486121446028 ms.
>>          Pending Tasks: 0
>>                   Column Family: Standard1
>>                   SSTable count: 9
>>                        Space used (live): 131438293207
>>                        Space used (total): 143577216419
>>                   Memtable Columns Count: 0
>>                   Memtable Data Size: 0
>>                   Memtable Switch Count: 454
>>                   Read Count: 0
>>                   Read Latency: NaN ms.
>>                   Write Count: 302373
>>                   Write Latency: 0.220 ms.
>>                   Pending Tasks: 0
>>                   Key cache capacity: 200000
>>                   Key cache size: 0
>>                   Key cache hit rate: NaN
>>                   Row cache: disabled
>>                   Compacted row minimum size: 100316
>>                   Compacted row maximum size: 100323
>>                   Compacted row mean size: 100322
>>
>> On ec2-server3, df reports:
>> /dev/sdb             153899044 140784388   5297032  97% /mnt
>>
>> So this node should (I would think) contain 30GB of data on a 160GB hard
>> drive but somehow it has grown to 122 GB of data (live space) plus several
>> compacted files that have not yet been deleted (total space).  Keep in
>> mind that no deletions or updates have taken place, just unique key writes.
>>
>> At this point, here are the compaction lines from the ec2-server3 system.log
>> file, taking place after all writes had completed:
>>
>> INFO [COMPACTION-POOL:1] 2010-08-16 14:11:58,614 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db.
>> 8595448738/8595448738 bytes for 85678 keys.  Time: 984905ms.
>>  INFO [COMPACTION-POOL:1] 2010-08-16 14:20:02,825 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-466-Data.db.
>> 2157339354/2157339354 bytes for 21504 keys.  Time: 484188ms.
>>  INFO [COMPACTION-POOL:1] 2010-08-16 14:28:37,066 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-468-Data.db.
>> 2157339329/2157339329 bytes for 21504 keys.  Time: 514226ms.
>>  INFO [COMPACTION-POOL:1] 2010-08-16 14:36:04,806 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-469-Data.db.
>> 2157339417/2157339417 bytes for 21504 keys.  Time: 447720ms.
>>  INFO [COMPACTION-POOL:1] 2010-08-16 14:40:31,234 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-470-Data.db.
>> 2157339202/2157339202 bytes for 21504 keys.  Time: 266418ms.
>>  INFO [COMPACTION-POOL:1] 2010-08-16 14:58:48,943 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db.
>> 8629357302/8629357302 bytes for 86016 keys.  Time: 1097659ms.
>>  INFO [COMPACTION-POOL:1] 2010-08-16 15:02:43,428 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db.
>> 2157338687/2157338687 bytes for 21504 keys.  Time: 234464ms.
>>  INFO [COMPACTION-POOL:1] 2010-08-16 15:05:08,807 CompactionManager.java (line
>> 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db.
>> 1345327705/1345327705 bytes for 13410 keys.  Time: 145363ms.
>>
>> Here’s what’s in the data directory on this node:
>>
>> -rw-r--r-- 1 root root  9518412981 Aug 16 13:36 Standard1-247-Data.db
>> -rw-r--r-- 1 root root  8595448738 Aug 16 14:11 Standard1-465-Data.db
>> -rw-r--r-- 1 root root 22968417153 Aug 16 14:34 Standard1-467-Data.db
>> -rw-r--r-- 1 root root  8629357302 Aug 16 14:58 Standard1-471-Data.db
>> -rw-r--r-- 1 root root  2157338687 Aug 16 15:02 Standard1-473-Data.db
>> -rw-r--r-- 1 root root  1345327705 Aug 16 15:05 Standard1-474-Data.db
>> -rw-r--r-- 1 root root 28995571589 Aug 16 15:38 Standard1-475-Data.db
>> -rw-r--r-- 1 root root 30063580839 Aug 16 17:26 Standard1-476-Data.db
>> -rw-r--r-- 1 root root 19091100731 Aug 16 17:24 Standard1-477-Data.db
>> Plus a bunch of compacted files.
>>
>> At this point, I performed a manual "nodetool cleanup" on the super large
>> node when *very bad things* happened.  Here's an excerpt from the large
>> node's log file upon issuing the cleanup:
>>
>> INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 CompactionManager.java (line
>> 345) AntiCompacting [org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-477-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-475-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-467-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-247-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-476-Data.db'),
>> org.apache.cassandra.io.SSTableReader(
>> path='/var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db')]
>>
>>  INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 StorageService.java
>> (line 1499) requesting GC to free disk space
>>  INFO [GC inspection] 2010-08-16 23:04:04,470 GCInspector.java (line 110)
>> GC for ConcurrentMarkSweep: 233 ms, 299442152 reclaimed leaving 20577560
>> used; max is 1172766720
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:14,641
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-419-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:17,974
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-466-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,100
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-428-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,225
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-415-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,374
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-441-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,514
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-461-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,623
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-442-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,740
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-425-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,891
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-447-Data.db
>> …
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,331
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-451-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,423
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-450-Data.db
>> ERROR [COMPACTION-POOL:1] 2010-08-16 23:04:25,525
>> DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
>> java.util.concurrent.ExecutionException:
>> java.lang.UnsupportedOperationException: disk full
>>        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.after
>> Execute (DebuggableThreadPoolExecutor.java:86)
>>        at
>> org.apache.cassandra.db.CompactionManager$CompactionExecutor.after
>> Execute (CompactionManager.java:582)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
>> (ThreadPoolExecutor.java:888)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.UnsupportedOperationException: disk full
>>        at
>> org.apache.cassandra.db.CompactionManager.doAntiCompaction
>> (CompactionManager.java:351)
>>        at
>> org.apache.cassandra.db.CompactionManager.doCleanupCompaction
>> (CompactionManager.java:417)
>>        at
>> org.apache.cassandra.db.CompactionManager.access$400
>> (CompactionManager.java:49)
>>        at
>> org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:130)
>>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask
>> (ThreadPoolExecutor.java:886)
>>        ... 2 more
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,558
>> SSTableDeletingReference.java (line 104)
>> Deleted /var/lib/cassandra/data/Keyspace1/Standard1-449-Data.db
>> …
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:33,265
>> SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
>> /data/Keyspace1/Standard1-453-Data.db
>>  INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java
>> (line 246) Compacting []
>>
>> The next morning (19 hours later):
>>
>> INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java
>> (line 246) Compacting []
>>  INFO [COMPACTION-POOL:1] 2010-08-17 00:05:08,637 CompactionManager.java
>> (line 246) Compacting []
>>  INFO [COMPACTION-POOL:1] 2010-08-17 01:05:08,607 CompactionManager.java
>> (line 246) Compacting []
>>  INFO [COMPACTION-POOL:1] 2010-08-17 02:05:08,581 CompactionManager.java
>> (line 246) Compacting []
>>  INFO [COMPACTION-POOL:1] 2010-08-17 03:05:08,568 CompactionManager.java
>> (line 246) Compacting []
>>  INFO [COMPACTION-POOL:1] 2010-08-17 04:05:08,532 CompactionManager.java
>> (line 246) Compacting []
>>  INFO [COMPACTION-POOL:1] 2010-08-17 05:05:08,505 CompactionManager.java
>> (line 246) Compacting []
>>  INFO [COMPACTION-POOL:1] 2010-08-17 06:05:08,494 CompactionManager.java
>> (line 246) Compacting []
>>
>> Also, 19 hours later, ring distribution is the same:
>>
>> Address       Status     Load          Range
>>  Ring
>>                                       170141183460469231731687303715884105728
>> 10.210.198.64 Up         30.28 GB      21267647932558653966460912964485513216
>>  |<--|
>> 10.210.157.187Up         28.12 GB      42535295865117307932921825928971026432
>>  |   ^
>> 10.206.34.194 Up         122.41 GB     63802943797675961899382738893456539648
>>  v   |
>> 10.254.107.178Up         33.89 GB      85070591730234615865843651857942052864
>>  |   ^
>> 10.254.234.226Up         28.01 GB      106338239662793269832304564822427566080
>>  v   |
>> 10.254.242.159Up         72.58 GB      127605887595351923798765477786913079296
>>  |   ^
>> 10.214.18.198 Up         83.41 GB      148873535527910577765226390751398592512
>>  v   |
>> 10.214.26.118 Up         62.01 GB      170141183460469231731687303715884105728
>>  |-->|
>>
>> So nodetool cleanup seems to have resulted in a fatal error on my overly (>4x)
>> bloated node.
>>
>> Can anyone help me with understanding why this happened, taking into account
>> the node should only contain 30GB of data on a 160GB hard drive?  I have the 8
>> system.log files from the 24 hour period (the nodes were only alive for 24
>> hours total).
>>
>> Thank you!
>> Julie
>>
>>
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

I noticed something funny about repair as well. I upgraded to 6.3 then
joined a node with bootstrap set to false.

So the node was empty. I ran REPAIR on the new node. Later I noticed
that the node had 2-4x the data it should have had.

Triggering REPAIR should cause anti compaction on the closest nodes on
the ring. Those nodes should then stream ONLY the data that belongs on
new node to the new node. Right? By the size of the new node I assumed
that all the contents of both nodes were streamed to this node. (This
could be wrong though) Sorry I have no good test case to replicate
this.

Mime
View raw message