cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolai Grigoriev <ngrigor...@gmail.com>
Subject Re: Compaction Strategy guidance
Date Mon, 24 Nov 2014 03:37:41 GMT
Just to clarify - when I was talking about the large amount of data I
really meant large amount of data per node in a single CF (table). LCS does
not seem to like it when it gets thousands of sstables (makes 4-5 levels).

When bootstraping a new node you'd better enable that option from
CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
not go down. Number of sstables at L0  is over 11K and it is slowly slowly
building upper levels. Total number of sstables is 4x the normal amount.
Now I am not entirely sure if this node will ever get back to normal life.
And believe me - this is not because of I/O, I have SSDs everywhere and 16
physical cores. This machine is barely using 1-3 cores at most of the time.
The problem is that allowing STCS fallback is not a good option either - it
will quickly result in a few 200Gb+ sstables in my configuration and then
these sstables will never be compacted. Plus, it will require close to 2x
disk space on EVERY disk in my JBOD configuration...this will kill the node
sooner or later. This is all because all sstables after bootstrap end at L0
and then the process slowly slowly moves them to other levels. If you have
write traffic to that CF then the number of sstables and L0 will grow
quickly - like it happens in my case now.

Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 is
implemented it may be better.


On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov <aivanov@iponweb.net> wrote:

> Stephane,
>
> We are having a somewhat similar C* load profile. Hence some comments
> in addition Nikolai's answer.
> 1. Fallback to STCS - you can disable it actually
> 2. Based on our experience, if you have a lot of data per node, LCS
> may work just fine. That is, till the moment you decide to join
> another node - chances are that the newly added node will not be able
> to compact what it gets from old nodes. In your case, if you switch
> strategy the same thing may happen. This is all due to limitations
> mentioned by Nikolai.
>
> Andrei,
>
>
> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. <smgesi@gmail.com>
> wrote:
> > ABUSE
> >
> >
> >
> > YA NO QUIERO MAS MAILS SOY DE MEXICO
> >
> >
> >
> > De: Nikolai Grigoriev [mailto:ngrigoriev@gmail.com]
> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
> > Para: user@cassandra.apache.org
> > Asunto: Re: Compaction Strategy guidance
> > Importancia: Alta
> >
> >
> >
> > Stephane,
> >
> > As everything good, LCS comes at certain price.
> >
> > LCS will put most load on you I/O system (if you use spindles - you may
> need
> > to be careful about that) and on CPU. Also LCS (by default) may fall
> back to
> > STCS if it is falling behind (which is very possible with heavy writing
> > activity) and this will result in higher disk space usage. Also LCS has
> > certain limitation I have discovered lately. Sometimes LCS may not be
> able
> > to use all your node's resources (algorithm limitations) and this reduces
> > the overall compaction throughput. This may happen if you have a large
> > column family with lots of data per node. STCS won't have this
> limitation.
> >
> >
> >
> > By the way, the primary goal of LCS is to reduce the number of sstables
> C*
> > has to look at to find your data. With LCS properly functioning this
> number
> > will be most likely between something like 1 and 3 for most of the reads.
> > But if you do few reads and not concerned about the latency today, most
> > likely LCS may only save you some disk space.
> >
> >
> >
> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay <slegay@looplogic.com>
> > wrote:
> >
> > Hi there,
> >
> >
> >
> > use case:
> >
> >
> >
> > - Heavy write app, few reads.
> >
> > - Lots of updates of rows / columns.
> >
> > - Current performance is fine, for both writes and reads..
> >
> > - Currently using SizedCompactionStrategy
> >
> >
> >
> > We're trying to limit the amount of storage used during compaction.
> Should
> > we switch to LeveledCompactionStrategy?
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> > --
> >
> > Nikolai Grigoriev
> > (514) 772-5178
>



-- 
Nikolai Grigoriev
(514) 772-5178

Mime
View raw message