cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shimi <shim...@gmail.com>
Subject Re: Reclaim deleted rows space
Date Mon, 10 Jan 2011 14:00:14 GMT
I modified the code to limit the size of the SSTables.
I will be glad if someone can take a look at it

https://github.com/Shimi/cassandra/tree/cassandra-0.6

<https://github.com/Shimi/cassandra/tree/cassandra-0.6>Shimi

On Fri, Jan 7, 2011 at 2:04 AM, Jonathan Shook <jshook@gmail.com> wrote:

> I believe the following condition within submitMinorIfNeeded(...)
> determines whether to continue, so it's not a hard loop.
>
> // if (sstables.size() >= minThreshold) ...
>
>
>
> On Thu, Jan 6, 2011 at 2:51 AM, shimi <shimi.k@gmail.com> wrote:
> > According to the code it make sense.
> > submitMinorIfNeeded() calls doCompaction() which
> > calls submitMinorIfNeeded().
> > With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run
> > compaction.
> >
> > Shimi
> > On Thu, Jan 6, 2011 at 10:26 AM, shimi <shimi.k@gmail.com> wrote:
> >>
> >>
> >> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>>
> >>> Pretty sure there's logic in there that says "don't bother compacting
> >>> a single sstable."
> >>
> >> No. You can do it.
> >> Based on the log I have a feeling that it triggers an infinite
> compaction
> >> loop.
> >>
> >>>
> >>> On Wed, Jan 5, 2011 at 2:26 PM, shimi <shimi.k@gmail.com> wrote:
> >>> > How does minor compaction is triggered? Is it triggered Only when a
> new
> >>> > SStable is added?
> >>> >
> >>> > I was wondering if triggering a compaction
> >>> > with minimumCompactionThreshold
> >>> > set to 1 would be useful. If this can happen I assume it will do
> >>> > compaction
> >>> > on files with similar size and remove deleted rows on the rest.
> >>> > Shimi
> >>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
> >>> > <peter.schuller@infidyne.com>
> >>> > wrote:
> >>> >>
> >>> >> > I don't have a problem with disk space. I have a problem with
the
> >>> >> > data
> >>> >> > size.
> >>> >>
> >>> >> [snip]
> >>> >>
> >>> >> > Bottom line is that I want to reduce the number of requests
that
> >>> >> > goes to
> >>> >> > disk. Since there is enough data that is no longer valid I
can do
> it
> >>> >> > by
> >>> >> > reclaiming the space. The only way to do it is by running
Major
> >>> >> > compaction.
> >>> >> > I can wait and let Cassandra do it for me but then the data
size
> >>> >> > will
> >>> >> > get
> >>> >> > even bigger and the response time will be worst. I can do
it
> >>> >> > manually
> >>> >> > but I
> >>> >> > prefer it to happen in the background with less impact on
the
> system
> >>> >>
> >>> >> Ok - that makes perfect sense then. Sorry for misunderstanding
:)
> >>> >>
> >>> >> So essentially, for workloads that are teetering on the edge of
> cache
> >>> >> warmness and is subject to significant overwrites or removals,
it
> may
> >>> >> be beneficial to perform much more aggressive background compaction
> >>> >> even though it might waste lots of CPU, to keep the in-memory
> working
> >>> >> set down.
> >>> >>
> >>> >> There was talk (I think in the compaction redesign ticket) about
> >>> >> potentially improving the use of bloom filters such that obsolete
> data
> >>> >> in sstables could be eliminated from the read set without
> >>> >> necessitating actual compaction; that might help address cases
like
> >>> >> these too.
> >>> >>
> >>> >> I don't think there's a pre-existing silver bullet in a current
> >>> >> release; you probably have to live with the need for
> >>> >> greater-than-theoretically-optimal memory requirements to keep
the
> >>> >> working set in memory.
> >>> >>
> >>> >> --
> >>> >> / Peter Schuller
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> Project Chair, Apache Cassandra
> >>> co-founder of Riptano, the source for professional Cassandra support
> >>> http://riptano.com
> >>
> >
> >
>

Mime
View raw message