cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolai Grigoriev <ngrigor...@gmail.com>
Subject Re: Compaction Strategy guidance
Date Tue, 25 Nov 2014 18:48:07 GMT
Andrei,

Oh, yes, I have scanned the top of your previous email but overlooked the
last part.

I am using SSDs so I prefer to put extra work to keep my system performing
and save expensive disk space. So far I've been able to size the system
more or less correctly so these LCS limitations do not cause too much
troubles. But I do keep the CF "sharding" option as backup - for me it will
be relatively easy to implement it.

On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov <aivanov@iponweb.net> wrote:

> Nikolai,
>
> Just in case you've missed my comment in the thread (guess you have) -
> increasing sstable size does nothing (in our case at least). That is,
> it's not worse but the load pattern is still the same - doing nothing
> most of the time. So, I switched to STCS and we will have to live with
> extra storage cost - storage is way cheaper than cpu etc anyhow:-)
>
> On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev <ngrigoriev@gmail.com>
> wrote:
> > Hi Jean-Armel,
> >
> > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
> there
> > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
> 2.0.10.
> >
> > I have about 1,8Tb of data per node now in total, which falls into that
> > range.
> >
> > As I said, it is really a problem with large amount of data in a single
> CF,
> > not total amount of data. Quite often the nodes are idle yet having
> quite a
> > bit of pending compactions. I have discussed it with other members of C*
> > community and DataStax guys and, they have confirmed my observation.
> >
> > I believe that increasing the sstable size won't help at all and probably
> > will make the things worse - everything else being equal, of course. But
> I
> > would like to hear from Andrei when he is done with his test.
> >
> > Regarding the last statement - yes, C* clearly likes many small servers
> more
> > than fewer large ones. But it is all relative - and can be all
> recalculated
> > to $$$ :) C* is all about partitioning of everything - storage,
> > traffic...Less data per node and more nodes give you lower latency, lower
> > heap usage etc, etc. I think I have learned this with my project.
> Somewhat
> > hard way but still, nothing is better than the personal experience :)
> >
> > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce <jaluce06@gmail.com>
> wrote:
> >>
> >> Hi Andrei, Hi Nicolai,
> >>
> >> Which version of C* are you using ?
> >>
> >> There are some recommendations about the max storage per node :
> >>
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
> >>
> >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
> >> handle 10x
> >> (3-5TB)".
> >>
> >> I have the feeling that those recommendations are sensitive according
> many
> >> criteria such as :
> >> - your hardware
> >> - the compaction strategy
> >> - ...
> >>
> >> It looks that LCS lower those limitations.
> >>
> >> Increasing the size of sstables might help if you have enough CPU and
> you
> >> can put more load on your I/O system (@Andrei, I am interested by the
> >> results of your  experimentation about large sstable files)
> >>
> >> From my point of view, there are some usage patterns where it is better
> to
> >> have many small servers than a few large servers. Probably, it is
> better to
> >> have many small servers if you need LCS for large tables.
> >>
> >> Just my 2 cents.
> >>
> >> Jean-Armel
> >>
> >> 2014-11-24 19:56 GMT+01:00 Robert Coli <rcoli@eventbrite.com>:
> >>>
> >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev <
> ngrigoriev@gmail.com>
> >>> wrote:
> >>>>
> >>>> One of the obvious recommendations I have received was to run more
> than
> >>>> one instance of C* per host. Makes sense - it will reduce the amount
> of data
> >>>> per node and will make better use of the resources.
> >>>
> >>>
> >>> This is usually a Bad Idea to do in production.
> >>>
> >>> =Rob
> >>>
> >>
> >>
> >
> >
> >
> > --
> > Nikolai Grigoriev
> > (514) 772-5178
>



-- 
Nikolai Grigoriev
(514) 772-5178

Mime
View raw message