ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Kozlov <skoz...@gridgain.com>
Subject Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?
Date Thu, 03 Oct 2019 19:50:00 GMT
Hi

I'm not sure that node offline is a best way to do that.
Cons:
 - different caches may have different defragmentation but we force to stop
whole node
 - offline node is a maintenance operation will require to add +1 backup to
reduce the risk of data loss
 - baseline auto adjustment?
 - impact to index rebuild?
 - cache configuration changes (or destroy) during node offline

What about other ways without node stop? E.g. make cache group on a node
offline? Add *defrag <cache_group> *command to control.sh to force start
rebalance internally in the node with expected impact to performance.



On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov <av@apache.org> wrote:

> Alexey,
> As for me, it does not matter will it be IEP, umbrella or a single issue.
> The most important thing is Assignee :)
>
> On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> alexey.goncharuk@gmail.com>
> wrote:
>
> > Anton, do you think we should file a single ticket for this or should we
> go
> > with an IEP? As of now, the change does not look big enough for an IEP
> for
> > me.
> >
> > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov <av@apache.org>:
> >
> > > Alexey,
> > >
> > > Sounds good to me.
> > >
> > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com>
> > > wrote:
> > >
> > > > Anton,
> > > >
> > > > Switching a partition to and from the SHRINKING state will require
> > > > intricate synchronizations in order to properly determine the start
> > > > position for historical rebalance without PME.
> > > >
> > > > I would still go with an offline-node approach, but instead of
> cleaning
> > > the
> > > > persistence, we can do effective defragmentation when the node is
> > offline
> > > > because we are sure that there is no concurrent load. After the
> > > > defragmentation completes, we bring the node back to the cluster and
> > > > historical rebalance will kick in automatically. It will still
> require
> > > > manual node restarts, but since the data is not removed, there are no
> > > > additional risks. Also, this will be an excellent solution for those
> > who
> > > > can afford downtime and execute the defragment command on all nodes
> in
> > > the
> > > > cluster simultaneously - this will be the fastest way possible.
> > > >
> > > > --AG
> > > >
> > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov <av@apache.org>:
> > > >
> > > > > Alexei,
> > > > > >> stopping fragmented node and removing partition data, then
> > starting
> > > it
> > > > > again
> > > > >
> > > > > That's exactly what we're doing to solve the fragmentation issue.
> > > > > The problem here is that we have to perform N/B restart-rebalance
> > > > > operations (N - cluster size, B - backups count) and it takes a lot
> > of
> > > > time
> > > > > with risks to lose the data.
> > > > >
> > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >
> > > > > > Probably this should be allowed to do using public API, actually
> > this
> > > > is
> > > > > > same as manual rebalancing.
> > > > > >
> > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
> > > > > > alexey.scherbakoff@gmail.com>:
> > > > > >
> > > > > > > The poor man's solution for the problem would be stopping
> > > fragmented
> > > > > node
> > > > > > > and removing partition data, then starting it again allowing
> full
> > > > state
> > > > > > > transfer already without deletes.
> > > > > > > Rinse and repeat for all owners.
> > > > > > >
> > > > > > > Anton Vinogradov, would this work for you as workaround
?
> > > > > > >
> > > > > > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov
<av@apache.org
> >:
> > > > > > >
> > > > > > >> Alexey,
> > > > > > >>
> > > > > > >> Let's combine your and Ivan's proposals.
> > > > > > >>
> > > > > > >> >> vacuum command, which acquires exclusive table
lock, so no
> > > > > concurrent
> > > > > > >> activities on the table are possible.
> > > > > > >> and
> > > > > > >> >> Could the problem be solved by stopping a
node which needs
> to
> > > be
> > > > > > >> defragmented, clearing persistence files and restarting
the
> > node?
> > > > > > >> >> After rebalancing the node will receive all
data back
> without
> > > > > > >> fragmentation.
> > > > > > >>
> > > > > > >> How about to have special partition state SHRINKING?
> > > > > > >> This state should mean that partition unavailable for
reads
> and
> > > > > updates
> > > > > > >> but
> > > > > > >> should keep it's update-counters and should not be
marked as
> > lost,
> > > > > > renting
> > > > > > >> or evicted.
> > > > > > >> At this state we able to iterate over the partition
and apply
> > it's
> > > > > > entries
> > > > > > >> to another file in a compact way.
> > > > > > >> Indices should be updated during the copy-on-shrink
procedure
> or
> > > at
> > > > > the
> > > > > > >> shrink completion.
> > > > > > >> Once shrank file is ready we should replace the original
> > partition
> > > > > file
> > > > > > >> with it and mark it as MOVING which will start the
historical
> > > > > rebalance.
> > > > > > >> Shrinking should be performed during the low activity
periods,
> > but
> > > > > even
> > > > > > in
> > > > > > >> case we found that activity was high and historical
rebalance
> is
> > > not
> > > > > > >> suitable we may just remove the file and use regular
rebalance
> > to
> > > > > > restore
> > > > > > >> the partition (this will also lead to shrink).
> > > > > > >>
> > > > > > >> BTW, seems, we able to implement partition shrink in
a cheap
> > way.
> > > > > > >> We may just use rebalancing code to apply fat partition's
> > entries
> > > to
> > > > > the
> > > > > > >> new file.
> > > > > > >> So, 3 stages here: local rebalance, indices update
and global
> > > > > historical
> > > > > > >> rebalance.
> > > > > > >>
> > > > > > >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> > > > > > >> alexey.goncharuk@gmail.com> wrote:
> > > > > > >>
> > > > > > >> > Anton,
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > > >>  The solution which Anton suggested
does not look easy
> > > > because
> > > > > it
> > > > > > >> will
> > > > > > >> > > most likely significantly hurt performance
> > > > > > >> > > Mostly agree here, but what drop do we expect?
What price
> do
> > > we
> > > > > > ready
> > > > > > >> to
> > > > > > >> > > pay?
> > > > > > >> > > Not sure, but seems some vendors ready to
pay, for
> example,
> > 5%
> > > > > drop
> > > > > > >> for
> > > > > > >> > > this.
> > > > > > >> >
> > > > > > >> > 5% may be a big drop for some use-cases, so I
think we
> should
> > > look
> > > > > at
> > > > > > >> how
> > > > > > >> > to improve performance, not how to make it worse.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > > >> it is hard to maintain a data structure
to choose "page
> > > from
> > > > > > >> free-list
> > > > > > >> > > with enough space closest to the beginning
of the file".
> > > > > > >> > > We can just split each free-list bucket to
the couple and
> > use
> > > > > first
> > > > > > >> for
> > > > > > >> > > pages in the first half of the file and the
second for the
> > > last.
> > > > > > >> > > Only two buckets required here since, during
the file
> > shrink,
> > > > > first
> > > > > > >> > > bucket's window will be shrank too.
> > > > > > >> > > Seems, this give us the same price on put,
just use the
> > first
> > > > > bucket
> > > > > > >> in
> > > > > > >> > > case it's not empty.
> > > > > > >> > > Remove price (with merge) will be increased,
of course.
> > > > > > >> > >
> > > > > > >> > > The compromise solution is to have priority
put (to the
> > first
> > > > path
> > > > > > of
> > > > > > >> the
> > > > > > >> > > file), with keeping removal as is, and schedulable
> per-page
> > > > > > migration
> > > > > > >> for
> > > > > > >> > > the rest of the data during the low activity
period.
> > > > > > >> > >
> > > > > > >> > Free lists are large and slow by themselves, it
is expensive
> > to
> > > > > > >> checkpoint
> > > > > > >> > and read them on start, so as a long-term solution
I would
> > look
> > > > into
> > > > > > >> > removing them. Moreover, not sure if adding yet
another
> > > background
> > > > > > >> process
> > > > > > >> > will improve the codebase reliability and simplicity.
> > > > > > >> >
> > > > > > >> > If we want to go the hard path, I would look at
free page
> > > tracking
> > > > > > >> bitmap -
> > > > > > >> > a special bitmask page, where each page in an
adjacent block
> > is
> > > > > marked
> > > > > > >> as 0
> > > > > > >> > if it has free space more than a certain configurable
> > threshold
> > > > > (say,
> > > > > > >> 80%)
> > > > > > >> > - free, and 1 if less (full). Some vendors have
successfully
> > > > > > implemented
> > > > > > >> > this approach, which looks much more promising,
but harder
> to
> > > > > > implement.
> > > > > > >> >
> > > > > > >> > --AG
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message