cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-8571) Free space management does not work very well
Date Wed, 07 Jan 2015 00:05:37 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-8571.
---------------------------------------
       Resolution: Duplicate
    Fix Version/s:     (was: 2.1.3)

> Free space management does not work very well
> ---------------------------------------------
>
>                 Key: CASSANDRA-8571
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8571
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Bartłomiej Romański
>
> Hi all,
> We've got a cluster of 2.1.2 with 18 nodes equipped with 3x 480GB SSD each (JBODs). We
mostly use LCS.
> Recently, our nodes starts failing with 'no space left on device'. It all started with
our mistake - we let our LCS accumulate too much in L0.
> As a result, STCS woke up and we end with some big sstables on each node (let's say 5-10
sstables, 20-50gb each).
> During normal operation we keep our disks about 50% full. This gives about 200 GB free
space on each of them. This was too little for compacting all accumulated L0 sstables at once.
Cassandra kept trying to do that and keep failing...
> Evantually, we managed to stabilized the situation (with some crazy code hacking, manually
moving sstables etc...). However, there are a few things that would be more than helpful in
recovering from such situations more automatically... 
> First, please look at DiskAwareRunnable.runMayThrow(). This methods initiates (local)
variable: writeSize. I believe we should check somewhere here if we have enough space on a
chosen disk. The problem is that writeSize is never read... Am I missing something here?
> Btw, while in STCS we first look for the least overloaded disk, and then (if there are
more than one such disks) for the one with the most free space (please note the sort order
in Directories.getWriteableLocation()). That's often suboptimal (it's usually better to wait
for the bigger disk than to compact fewer sstables now), but probably not crucial.
> Second, the strategy (used by LCS) that we first choose target disk and then use it for
whole compaction is not the best one. For big compactions (eg. after some massive operations
like bootstrap or repair; or after some issues with LCS like in our case) on small drives
(eg. JBOD of SSDs) these will never succeed. Much better strategy would be to choose target
drive for each output sstable separately, or at least round robin them.
> Third, it would be helpful if the default check for MAX_COMPACTING_L0 in LeveledManifest.getCandidatesFor()
would be expanded to support also limit for total space. After fallback STCS in L0 you end
up with very big sstables and 32 of them is just too much for one compaction on a small drives.
> We finally used some hack similar the last option (as it was the easiest one to implement
in a hurry), but any improvents described above would save us from all this.
> Thanks,
> BR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message