cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Compacting single file forever
Date Wed, 27 Apr 2011 21:18:32 GMT
https://issues.apache.org/jira/browse/CASSANDRA-2575

On Thu, Apr 21, 2011 at 11:56 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> I suggest as a workaround making the forceUserDefinedCompaction method
> ignore disk space estimates and attempt the requested compaction even
> if it guesses it will not have enough space. This would allow you to
> submit the 2 sstables you want manually.
>
> On Thu, Apr 21, 2011 at 8:34 AM, Shotaro Kamio <kamioshot@gmail.com> wrote:
>> Hi Aaron,
>>
>>
>> Maybe, my previous description was not good. It's not a compaction
>> threshold problem.
>> In fact, Cassandra tries to compact 7 sstables in the minor
>> compaction. But it decreases the number of sstables one by one due to
>> insufficient disk space. At the end, it compacts a single file as in
>> the new log below.
>>
>> Compactionstats on a node says:
>>
>>  compaction type: Minor
>>  column family: foobar
>>  bytes compacted: 133473101929
>>  bytes total in progress: 170000743825
>>  pending tasks: 12
>>
>> The disk usage reaches 78%. It's really tough situation. But I guess
>> the data contains a lot of duplicates. because we feed same data again
>> and again and do repair.
>>
>>
>> Another thing I'm wondering is a file selection algorithm.
>> For example, one of disks has 235G free space. It contains sstables of
>> 61G, 159G, 191G, 196G, 197G. The one cassandra trying to compact
>> forever is 159G sstable. But there is smaller sstable. It should try
>> compacting 61G + 159G ideally.
>> A more intelligent algorithm is required to find optimal combination.
>> And if cassandra knows statistics about number of deleted data and old
>> data to be compacted for sstables, it should be useful to find more
>> efficient file combination.
>>
>>
>> Regards,
>> Shotaro
>>
>>
>>
>> * Minor compaction log
>> -----
>>  WARN [CompactionExecutor:1] 2011-04-21 21:44:08,554
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-773-Data.db'),
>> SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1620-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1690-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:44:28,565
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-773-Data.db'),
>> SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1690-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:44:48,576
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-773-Data.db'),
>> SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:45:08,586
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:45:28,596
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:45:48,607
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>> ------
>>
>>
>>
>> On Thu, Apr 21, 2011 at 7:20 PM, aaron morton <aaron@thelastpickle.com> wrote:
>>> Want to check if you are talking about minor compactions or major (nodetool)
>>> compactions.
>>> What settings compaction settings do you have for this CF ? You can increase
>>> the min compaction threshold and reduce the frequency of
>>> compactions http://wiki.apache.org/cassandra/StorageConfiguration
>>> It seems like compaction is running continually, are their pending tasks in
>>> the o.a.c.db.CompactionManager MBean ?
>>> How bad is you disk space problem ?
>>> For the code change, AFAIK it's not possible for cassandra to know if there
>>> are tombstones in the SSTable which can be purged until the rows are read.
>>> Perhaps the file could hold the earliest deleted at time somewhere (same for
>>> TTL), but I do not think we do that now.
>>> Hope that helps.
>>> Aaron
>>>
>>> On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:
>>>
>>> Hi,
>>>
>>> I found that our cluster repeats compacting a single file forever
>>> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
>>> like to have comments from you guys.
>>>
>>> Situation:
>>> - After trying to repair a column family, our cluster's disk usage is
>>> quite high. Cassandra cannot compact all sstables at once. I think it
>>> repeats compacting single file at the end. (you can check the attached
>>> log below)
>>> - Our data doesn't have deletes. So, the compaction of single file
>>> doesn't make free disk space.
>>>
>>> We are approaching to full-disk. But I believe that the repair
>>> operation made a lot of duplicate data on the disk and it requires
>>> compaction. However, most of nodes stuck on compacting a single file.
>>> The only thing we can do is to restart the nodes.
>>>
>>> My question is why the compaction doesn't stop.
>>>
>>> I looked at the logic in CompactionManager.java:
>>> -----------------
>>>        String compactionFileLocation =
>>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>>>        // If the compaction file path is null that means we have no
>>> space left for this compaction.
>>>        // try again w/o the largest one.
>>>        List<SSTableReader> smallerSSTables = new
>>> ArrayList<SSTableReader>(sstables);
>>>        while (compactionFileLocation == null && smallerSSTables.size()
> 1)
>>>        {
>>>            logger.warn("insufficient space to compact all requested
>>> files " + StringUtils.join(smallerSSTables, ", "));
>>>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>>>            compactionFileLocation =
>>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>>>        }
>>>        if (compactionFileLocation == null)
>>>        {
>>>            logger.error("insufficient space to compact even the two
>>> smallest files, aborting");
>>>            return 0;
>>>        }
>>> -----------------
>>>
>>> The while condition: smallerSSTables.size() > 1
>>> Is this should be "smallerSSTables.size() > 2" ?
>>>
>>> In my understanding, compaction of single file makes free disk space
>>> only when the sstable has a lot of tombstone and only if the tombstone
>>> is removed in the compaction. If cassandra knows the sstable has
>>> tombstones to be removed, it's worth to compact it. Otherwise, it
>>> might makes free space if you are lucky. In worst case, it leads to
>>> infinite loop like our case.
>>>
>>> What do you think the code change?
>>>
>>>
>>> Best regards,
>>> Shotaro
>>>
>>>
>>> * Cassandra compaction log
>>> -------------------------
>>> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
>>> CompactionManager.java (line 405) insufficient space to compact all
>>> requested files SSTableReader(
>>> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
>>> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
>>> CompactionManager.java (line 482) Compacted to
>>> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>>> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
>>>
>>> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
>>> CompactionManager.java (line 405) insufficient space to compact all
>>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>>> SSTableReader(path='foobar-f-3035-Data.db')
>>> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
>>> CompactionManager.java (line 482) Compacted to
>>> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>>> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
>>>
>>> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
>>> CompactionManager.java (line 405) insufficient space to compact all
>>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>>> SSTableReader(path='foobar-f-3036-Data.db')
>>> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
>>> CompactionManager.java (line 482) Compacted to
>>> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>>> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
>>> -------------------------
>>> You can see that compacted size is always the same. It repeats
>>> compacting the same single sstable.
>>>
>>>
>>
>>
>>
>> --
>> Shotaro Kamio
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message