cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Compacting single file forever
Date Thu, 21 Apr 2011 09:22:32 GMT
Moving to the user list. 

Aaron

On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:

> Hi,
> 
> I found that our cluster repeats compacting a single file forever
> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
> like to have comments from you guys.
> 
> Situation:
> - After trying to repair a column family, our cluster's disk usage is
> quite high. Cassandra cannot compact all sstables at once. I think it
> repeats compacting single file at the end. (you can check the attached
> log below)
> - Our data doesn't have deletes. So, the compaction of single file
> doesn't make free disk space.
> 
> We are approaching to full-disk. But I believe that the repair
> operation made a lot of duplicate data on the disk and it requires
> compaction. However, most of nodes stuck on compacting a single file.
> The only thing we can do is to restart the nodes.
> 
> My question is why the compaction doesn't stop.
> 
> I looked at the logic in CompactionManager.java:
> -----------------
>        String compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>        // If the compaction file path is null that means we have no
> space left for this compaction.
>        // try again w/o the largest one.
>        List<SSTableReader> smallerSSTables = new
> ArrayList<SSTableReader>(sstables);
>        while (compactionFileLocation == null && smallerSSTables.size() > 1)
>        {
>            logger.warn("insufficient space to compact all requested
> files " + StringUtils.join(smallerSSTables, ", "));
>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>            compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>        }
>        if (compactionFileLocation == null)
>        {
>            logger.error("insufficient space to compact even the two
> smallest files, aborting");
>            return 0;
>        }
> -----------------
> 
> The while condition: smallerSSTables.size() > 1
> Is this should be "smallerSSTables.size() > 2" ?
> 
> In my understanding, compaction of single file makes free disk space
> only when the sstable has a lot of tombstone and only if the tombstone
> is removed in the compaction. If cassandra knows the sstable has
> tombstones to be removed, it's worth to compact it. Otherwise, it
> might makes free space if you are lucky. In worst case, it leads to
> infinite loop like our case.
> 
> What do you think the code change?
> 
> 
> Best regards,
> Shotaro
> 
> 
> * Cassandra compaction log
> -------------------------
> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(
> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
> 
> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3035-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
> 
> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3036-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
> -------------------------
> You can see that compacted size is always the same. It repeats
> compacting the same single sstable.


Mime
View raw message