incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Frequent updates of freshly written columns
Date Fri, 18 Feb 2011 18:16:18 GMT
On Fri, Feb 18, 2011 at 6:19 PM, Aklin_81 <asdkl93@gmail.com> wrote:

> Sylvain,
> I also need to store data that is frequently updated, same column
> being updated several times during each user session, at each action
> by user, But, this data is not very fresh and hence when I update this
> column frequently, there would be many versions of the same column in
> several sst files!
> Reading this type of data would not be too efficient I guess as the
> row would be totally scattered!
>
> Could there be any better strategy to store such data in cassandra?


> (Since the column holds an aggregate data obtained from all actions of
> the users, I have the need of updating that same column again & again)
>

That why compaction is for. Hopefully even if the column is scattered in
many sstable, compaction will keep that to a handfull of them. Chances are,
you won't see too bad read performances. But other than that, tweaking
memtable thresholds so that you don't flush too often will also help.

Now I don't what is your use case exactly and what is this aggregate. But if
there is a natural way to split this aggregate in multiple columns so that
each update will update only one of those columns forming the aggregate,
hopefully that would help. Really depends on what we are talking about.


> my another doubt,  When old column has been updated and exists in the
> memtable, but other versions of the column in SST tables exist, do the
> reads also scan the sst tables for that column, after memtable. or is
> that smart enough to say that this column is the most recent one ?
>

It can't skip the sstable. The problem is that you never know if the value
you see in the sstable is the more recent one. To take a concrete example,
suppose a node was down. When he goes up, changes are that he will see new
updates before he sees old updates that went while he was down (those will
arrive with either Hinted Handoff, read repair or repair). And more
generally, there is never any guarantee that messages will arrive to
replicas in the order they were received by the coordinator(s).

--
Sylvain


>
> On Fri, Feb 18, 2011 at 10:32 PM, Aklin_81 <asdkl93@gmail.com> wrote:
> >
> > Sylvain,
> > I also need to store data that is frequently updated, same column being
> updated several times during each user session, at each action by user, But,
> this data is not very fresh and hence when I update this column frequently,
> there would be many versions of the same column in several sst files!
> > Reading this type of data would not be too efficient I guess as the row
> would be totally scattered!
> >
> > Could there be any better strategy to store such data in cassandra?
> >
> > (Since the column holds an aggregate data obtained from all actions of
> the users, I have the need of updating that same column again & again)
> >
> >
> > my another doubt,  When old column has been updated and exists in the
> memtable, but other versions of the column in SST tables exist, do the reads
> also scan the sst tables for that column, after memtable. or is that smart
> enough to say that this column is the most recent one ?
> >
> >
> >
> >
> > On Fri, Feb 18, 2011 at 8:54 PM, James Churchman <
> jameschurchman@gmail.com> wrote:
> >>
> >> ok great, thanks for the exact clarification
> >> On 18 Feb 2011, at 14:11, Aklin_81 wrote:
> >>
> >> Compaction does not 'mutate' the sst files, it 'merges' several sst
> files into one with new indexes, merged data rows & deleting tombstones.
> Thus you reclaim your disk space.
> >>
> >>
> >> On Fri, Feb 18, 2011 at 7:34 PM, James Churchman <
> jameschurchman@gmail.com> wrote:
> >>>
> >>> but a compaction will mutate the sstables and reclaim the
> space (eventually)  ?
> >>>
> >>> james
> >>> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
> >>>
> >>> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 <asdkl93@gmail.com> wrote:
> >>>>
> >>>> Are the very freshly written columns to a row in memtables,
> efficiently updated/overwritten by edited/new column values.
> >>>>
> >>>> After flushing of memtable, are those(edited + unedited ones) columns
> stored together on disk (in same blocks!?) as if they were written in one
> single operation or same time ?? I know if old columns are edited then
> several copies of same column will be dispersed in different sst tables,
> what about fresh columns ?
> >>>>
> >>>> Are there any disadvantages to frequently updating fresh columns
> present in memtable ?
> >>>
> >>> The SSTables are immutable but the memtable are not. As long as you
> update/overwrite a column that is still in memtable, it is simply replaced
> in memory (so it's as efficient as it gets).
> >>> In other words, when the memtable is flushed, only the last version of
> the column goes in.
> >>> --
> >>> Sylvain
> >>
> >>
> >
>

Mime
View raw message