ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Павлухин Иван <vololo...@gmail.com>
Subject Re: Read Repair (ex. Consistency Check) - review request #2
Date Thu, 18 Jul 2019 13:56:20 GMT
Folks,

Sorry, I was not very attentive. Could you please clarify whether (or
not) this feature is currently in an experiment state? And a couple of
obvious things:
1. If it is experimental it should be clear for a user (e.g. from javadoc).
2. If not all limitations should be described in javadocs and documentation.

ср, 17 июл. 2019 г. в 14:59, Вячеслав Коптилин <slava.koptilin@gmail.com>:
>
> Hello Anton,
>
> It's not possible, currently, to fix atomic caches.
> > You may only check the consistency. *And it's better than nothing*, I
> > think.
>
> Fair enough.
>
> >> 3. IgniteConsistencyViolationException is absolutely useless. It does not
> > >> provide any information about the issue and possible way to fix it.
> > It means that some keys from your get operation are broken.
> > IgniteConsistencyViolationException CAN be extended with a list of broken
> > keys in the future.
>
> I think it SHOULD be extended with additional fields/methods in the same
> way as CacheConsistencyViolationEvent
>
> >> Well, near caches are widely used and fully transactional, so I think it
> > >> makes sense to support the feature for near caches too.
> > As I told before, it will be nice to implement this in the future, but we
> > have more important tasks for now.
>
> I do not insist that it must be done right now.
>
>
> > >> For instance, I would like to see all these limitations on the IEP page
> > as
> > >> JIRA tickets. Perhaps, it would be good to create an epic/umbrella
> > ticket
> > >> in order to track all activities related to `Read Repair` feature.
> > Let's do this at merge day to be sure useless issues will not be created.
> >
>
> I am just trying to say that it is a good time to create tickets in order
> to track all of that otherwise the chance that all these
> limitations/improvements will not be addressed is very high.
>
> Thanks,
> S.
>
> вт, 16 июл. 2019 г. в 09:07, Anton Vinogradov <av@apache.org>:
>
> > Svala,
> >
> > >> Could you please take a look at PR:
> > Going to review today, thanks for attaching the bot visa.
> >
> > >> 1. Should I consider that my cluster is broken? There is no answer! The
> > >> false-positive result is possible.
> > That's a question about atomic nature.
> > It's not impossible to lock atomic entry to perform the check.
> > You should perform some attempts, it's your decision how many.
> > By default, atomic RR performs 3 attempts, you may increase this by setting
> > IGNITE_NEAR_GET_MAX_REMAPS or by just performing additional gets.
> >
> > >> 2. What should be done here in order to check/resolve the issue?
> > Perhaps, I
> > >> should restart a node (which one?), restart the whole cluster, put a new
> > >> value...
> > It's not possible, currently, to fix atomic caches.
> > You may only check the consistency. And it's better than nothing, I think.
> > We should find a way how to fix atomic consistency first.
> > A possible strategy is to use ЕntryProcessor which will replace all owner's
> > values with "latest" and do nothing in case newest (than latest) value
> > found (opposite to preloading approach).
> >
> > >> 3. IgniteConsistencyViolationException is absolutely useless. It does
> > not
> > >> provide any information about the issue and possible way to fix it.
> > It means that some keys from your get operation are broken.
> > IgniteConsistencyViolationException CAN be extended with a list of broken
> > keys in the future.
> >
> > >> It seems that transactional caches are covered much better.
> > Correct.
> > Tx caches consistency is more important that atomic consistency, that's why
> > it was implemented first.
> > BTW, AFAIK, atomics also were not fixed at 10078 [1].
> >
> > >> Well, near caches are widely used and fully transactional, so I think it
> > >> makes sense to support the feature for near caches too.
> > As I told before, it will be nice to implement this in the future, but we
> > have more important tasks for now.
> > The main goal was to cover tx caches, to be able to fix them in case of the
> > real problem at production.
> >
> > Summarizing the roadmap,
> > My goal now is to finish the tx case, now we have an issue with false
> > positive consistency violation [2].
> > Also, we're going to update Jepsen tests [3] with RR to ensure tx caches
> > fixed.
> > Next main goal is to use RR at TC checks [4], help with this issue are
> > appreciated.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-10078
> > [2] https://issues.apache.org/jira/browse/IGNITE-11973
> > [3] https://issues.apache.org/jira/browse/IGNITE-11972
> > [4] https://issues.apache.org/jira/browse/IGNITE-11971
> >
> >
> > On Mon, Jul 15, 2019 at 4:51 PM Dmitriy Pavlov <dpavlov@apache.org> wrote:
> >
> > > Ok,  thank you
> > >
> > > пн, 15 июл. 2019 г., 16:46 Nikolay Izhikov <nizhikov@apache.org>:
> > >
> > > > I did the review.
> > > >
> > > > пн, 15 июля 2019 г., 16:15 Dmitriy Pavlov <dpavlov@apache.org>:
> > > >
> > > > > Igniters, who did a review of
> > > > > https://issues.apache.org/jira/browse/IGNITE-10663 before the merge?
> > > > I've
> > > > > checked both PR   https://github.com/apache/ignite/pull/5656  and
> > > Issue,
> > > > > and dev.list thread and didn't find any LGTM.
> > > > >
> > > > > Anton, since you've rejected lazy consensus in our process, we have
> > RTC
> > > > in
> > > > > that (core) module. So I'd like to know if the fix was covered by
the
> > > > > review.
> > > > >
> > > > > Because you're a committer, a reviewer can be non-committer. So,
who
> > > was
> > > > a
> > > > > reviewer? Or was process ignored?
> > > > >
> > > > > пн, 15 июл. 2019 г. в 15:37, Вячеслав Коптилин
<
> > > slava.koptilin@gmail.com
> > > > >:
> > > > >
> > > > > > Hello Anton,
> > > > > >
> > > > > > > I'd like to propose you to provide fixes as a PR since
you have a
> > > > > vision
> > > > > > of how it should be made. I'll review them and merge shortly.
> > > > > > Could you please take a look at PR:
> > > > > > https://github.com/apache/ignite/pull/6689
> > > > > >
> > > > > > > Since your comments mostly about Javadoc (does this mean
that my
> > > > > solution
> > > > > > is so great that you ask me only to fix Javadocs :) ?),
> > > > > > In my humble opinion, I would consider this feature as experimental
> > > one
> > > > > (It
> > > > > > does not seem production-ready).
> > > > > > Let me clarify this with the following simple example:
> > > > > >
> > > > > >     try {
> > > > > >         atomicCache.withReadRepair().getAll(keys);
> > > > > >     }
> > > > > >     catch (CacheException e) {
> > > > > >         // What should be done here from the end-user point
of
> > view?
> > > > > >     }
> > > > > >
> > > > > > 1. Should I consider that my cluster is broken? There is no
answer!
> > > The
> > > > > > false-positive result is possible.
> > > > > > 2. What should be done here in order to check/resolve the issue?
> > > > > Perhaps, I
> > > > > > should restart a node (which one?), restart the whole cluster,
put
> > a
> > > > new
> > > > > > value...
> > > > > > 3. IgniteConsistencyViolationException is absolutely useless.
It
> > does
> > > > not
> > > > > > provide any information about the issue and possible way to
fix it.
> > > > > >
> > > > > > It seems that transactional caches are covered much better.
> > > > > >
> > > > > > > Mostly agree with you, but
> > > > > > > - MVCC is not production ready,
> > > > > > > - not sure near support really required,
> > > > > > > - metrics are better for monitoring, but the Event is enough
for
> > my
> > > > > wish
> > > > > > to
> > > > > > > cover AI with consistency check,
> > > > > > > - do we really need Platforms and Thin Client support?
> > > > > > Well, near caches are widely used and fully transactional, so
I
> > think
> > > > it
> > > > > > makes sense to support the feature for near caches too.
> > > > > > .Net is already aware of 'ReadRepair'. It seems to me, that
it can
> > be
> > > > > > easily supported for C++. I don't see a reason why it should
not be
> > > > done
> > > > > :)
> > > > > >
> > > > > > > Do you mean per partition check and recovery? That's a
good idea,
> > > > but I
> > > > > > found it's not easy to imagine API to for such tool.
> > > > > > Yep, perhaps it can be done on the idle cluster via `idle-verify`
> > > > command
> > > > > > with additional flag. Agreed, that this approach is not the
best
> > one
> > > > > > definitely.
> > > > > >
> > > > > > Thanks,
> > > > > > S.
> > > > > >
> > > > > > чт, 11 июл. 2019 г. в 09:53, Anton Vinogradov <av@apache.org>:
> > > > > >
> > > > > > > Slava,
> > > > > > >
> > > > > > > Thanks for your review first!
> > > > > > >
> > > > > > > >> Anyway, I left some comments in your pull-request
at github.
> > > > Please
> > > > > > take
> > > > > > > a
> > > > > > > >> look. The most annoying thing is poorly documented
code :(
> > > > > > > Since your comments mostly about Javadoc (does this mean
that my
> > > > > solution
> > > > > > > is so great that you ask me only to fix Javadocs :) ?),
> > > > > > > I'd like to propose you to provide fixes as a PR since
you have a
> > > > > vision
> > > > > > of
> > > > > > > how it should be made. I'll review them and merge shortly.
> > > > > > >
> > > > > > > >> By the way, is it required to add test related
to fail-over
> > > > > scenarios?
> > > > > > > The best check is to use RR at real code.
> > > > > > > For example, I'm injecting RR now to the test with concurrent
> > > > > > modifications
> > > > > > > and restarts [1].
> > > > > > >
> > > > > > > >> I just checked, the IEP page and I still cannot
find Jira
> > > tickets
> > > > > that
> > > > > > > >> should cover existing limitations/improvements.
> > > > > > > >> I would suggest creating the following tasks,
at least:
> > > > > > > Mostly agree with you, but
> > > > > > > - MVCC is not production ready,
> > > > > > > - not sure near support really required,
> > > > > > > - metrics are better for monitoring, but the Event is enough
for
> > my
> > > > > wish
> > > > > > to
> > > > > > > cover AI with consistency check,
> > > > > > > - do we really need Platforms and Thin Client support?
> > > > > > > Also, we should not produce stillborn issue.
> > > > > > > All limitations listed at proxy creation method and they
> > definitely
> > > > are
> > > > > > not
> > > > > > > showstoppers and may be fixed later if someone interested.
> > > > > > > Сoming back to AI 3.0 discussion, we have A LOT of features
and
> > > it's
> > > > > > almost
> > > > > > > impossible (require much more time that feature's cost)
to
> > support
> > > > them
> > > > > > > all.
> > > > > > > I will be pretty happy in case someone will do this and
provide
> > > help
> > > > if
> > > > > > > necessary!
> > > > > > >
> > > > > > > >> Perhaps, it would be a good idea to think about
the recovery
> > too
> > > > > > > Do you mean per partition check and recovery?
> > > > > > > That's a good idea, but I found it's not easy to imagine
API to
> > for
> > > > > such
> > > > > > > tool.
> > > > > > > In case you ready to assist with proper API/design this
will
> > > > definitely
> > > > > > > help.
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > > > > >
> > > > > > > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин
<
> > > > > > > slava.koptilin@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Perhaps, it would be a good idea to think about the
recovery
> > > tool/
> > > > > > > > control-utility command that will allow achieving
the same
> > goal.
> > > > > > > > If I am not mistaken it was already proposed in the
email
> > thread.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > S.
> > > > > > > >
> > > > > > > > ср, 10 июл. 2019 г. в 15:33, Вячеслав
Коптилин <
> > > > > > slava.koptilin@gmail.com
> > > > > > > >:
> > > > > > > >
> > > > > > > > > Hi Anton,
> > > > > > > > >
> > > > > > > > > Well, the ReadRepair feature is finally merged
and that is
> > good
> > > > > news
> > > > > > :)
> > > > > > > > >
> > > > > > > > > Unfortunately, I cannot find a consensus about
the whole
> > > > > > functionality
> > > > > > > in
> > > > > > > > > any of these topics:
> > > > > > > > >  -
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > > > > > > > >  -
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > > > > > > > Also, there are no comments/discussion in JIRA.
That makes me
> > > sad
> > > > > :(
> > > > > > > > > especially when a feature is huge, not obvious
and involves
> > > > > changing
> > > > > > > > public
> > > > > > > > > API (and that is the case, I think).
> > > > > > > > >
> > > > > > > > > Anyway, I left some comments in your pull-request
at github.
> > > > Please
> > > > > > > take
> > > > > > > > a
> > > > > > > > > look. The most annoying thing is poorly documented
code :(
> > > > > > > > > By the way, is it required to add test related
to fail-over
> > > > > > scenarios?
> > > > > > > > >
> > > > > > > > > I just checked, the IEP page and I still cannot
find Jira
> > > tickets
> > > > > > that
> > > > > > > > > should cover existing limitations/improvements.
> > > > > > > > > I would suggest creating the following tasks,
at least:
> > > > > > > > >  - MVCC support
> > > > > > > > >  - Near caches
> > > > > > > > >  - Additional metrics (number of violations,
number of
> > repaired
> > > > > > entries
> > > > > > > > > etc)
> > > > > > > > >  - Ignite C++ (It looks like, .Net is already
aware of that
> > > > > feature)
> > > > > > > > >  - Thin clients support
> > > > > > > > >  - Perhaps, it would be useful to support different
> > strategies
> > > to
> > > > > > > resolve
> > > > > > > > > inconsistencies
> > > > > > > > >
> > > > > > > > > What do you think?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > S.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov
<av@apache.org
> > >:
> > > > > > > > >
> > > > > > > > >> Folks,
> > > > > > > > >>
> > > > > > > > >> Thanks to everyone for tips and reviews.
> > > > > > > > >> Yardstick checked, no performance drop found.
> > > > > > > > >> Additional measurement: RR get() is just
up to 7% slower
> > than
> > > > > > regular
> > > > > > > > get
> > > > > > > > >> on real benchmarks (8 clients, 4 servers,
3 backups)
> > > > > > > > >> Code merged to the master.
> > > > > > > > >> "Must have" tasks created and attached to
the IEP.
> > > > > > > > >>
> > > > > > > > >> On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov
<
> > > av@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > Folks,
> > > > > > > > >> >
> > > > > > > > >> > Just a minor update.
> > > > > > > > >> >
> > > > > > > > >> > RunAll [1] with enabled ReadRepair proxy
is almost green
> > now
> > > > > (~10
> > > > > > > > tests
> > > > > > > > >> > left, started with 6k :)).
> > > > > > > > >> > During the analisys, I've found some
tests with
> > > > > > > > >> > - unexpected repairs at tx caches
> > > > > > > > >> > - inconsistent state after the test
finished (different
> > > > entries
> > > > > > > across
> > > > > > > > >> the
> > > > > > > > >> > topology)
> > > > > > > > >> > For example,
> > > > > > > > >> > - testInvokeAllAppliedOnceOnBinaryTypeRegistration
> > generates
> > > > > > > obsolete
> > > > > > > > >> > versions on backups in case of retry,
fixed [2]
> > > > > > > > >> > - initial cache load generates not equal
versions on
> > > backups,
> > > > > > fixed
> > > > > > > > [3]
> > > > > > > > >> > - testAccountTxNodeRestart causes unexpected
repairs
> > > (entries
> > > > > have
> > > > > > > > >> > different versions), to be investigated.
> > > > > > > > >> >
> > > > > > > > >> > What's next?
> > > > > > > > >> >
> > > > > > > > >> > 1) Going to merge the solution once
"RunAll with
> > ReadRepair
> > > > > > enabled"
> > > > > > > > >> > becomes fully green.
> > > > > > > > >> > 2) Going to add special check after
each test which will
> > > > ensure
> > > > > > > caches
> > > > > > > > >> > content after the test is consistent.
> > > > > > > > >> > 2.1) The Same check can (should?) be
injected to
> > > > > > > > >> > awaitPartitionMapExchange() and similar
methods.
> > > > > > > > >> > 3) Update Jepsen tests with RR checks.
> > > > > > > > >> > 4) Introduce per partition RR.
> > > > > > > > >> >
> > > > > > > > >> > So, the final goal is to be sure that
Ignite produces only
> > > > > > > consistent
> > > > > > > > >> data
> > > > > > > > >> > and to have a feature to solve consistency
in case we gain
> > > > > > > > inconsistent
> > > > > > > > >> > state somehow.
> > > > > > > > >> >
> > > > > > > > >> > Limitations?
> > > > > > > > >> >
> > > > > > > > >> > Currently, RR has some limitations,
but they are not
> > related
> > > > to
> > > > > > real
> > > > > > > > >> > production cases.
> > > > > > > > >> > In case someone interested to support,
for example, MVCC
> > or
> > > > near
> > > > > > > > caches,
> > > > > > > > >> > please, feel free to contribute.
> > > > > > > > >> >
> > > > > > > > >> > [1]
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/6575/head&action=Latest
> > > > > > > > >> > [2]
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/ignite/pull/5656/commits/6f6ec4434095e692af209c61833a350f3013408c
> > > > > > > > >> > [3]
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/ignite/pull/5656/commits/255e552b474839e470c66a77e74e3c807bc76f13
> > > > > > > > >> >
> > > > > > > > >> > On Fri, Jun 28, 2019 at 2:41 PM Anton
Vinogradov <
> > > > av@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> >> Slava,
> > > > > > > > >> >>
> > > > > > > > >> >> >> I will take a look at your
pull request if you don't
> > > mind.
> > > > > > > > >> >> Great news!
> > > > > > > > >> >>
> > > > > > > > >> >> >> In any way, could you please
update the IEP page with
> > > the
> > > > > list
> > > > > > > of
> > > > > > > > >> >> >> constraints/limitations
of the proposed approach,
> > TODOs,
> > > > > etc?
> > > > > > > > >> >> Not sure we should keep this at
IEP until list (#4 from
> > > > > original
> > > > > > > > >> letter)
> > > > > > > > >> >> is not confirmed.
> > > > > > > > >> >>
> > > > > > > > >> >> Currently, I'm checking RunAll with
RR enabled to almost
> > > each
> > > > > get
> > > > > > > > >> request.
> > > > > > > > >> >> "Almost" means: readRepair = !ctx.readThrough()
&&
> > > > > > > > >> >> ctx.config().getBackups() > 0
&& !ctx.isNear() &&
> > > > > > > !ctx.mvccEnabled()
> > > > > > > > >> >> For now I have 60 failed tests and
amount decreasing.
> > > > > > > > >> >>
> > > > > > > > >> >> >> For instance, I would like
to see all these
> > limitations
> > > on
> > > > > the
> > > > > > > IEP
> > > > > > > > >> >> page as
> > > > > > > > >> >> >> JIRA tickets. Perhaps,
it would be good to create an
> > > > > > > epic/umbrella
> > > > > > > > >> >> ticket
> > > > > > > > >> >> >> in order to track all activities
related to `Read
> > > Repair`
> > > > > > > feature.
> > > > > > > > >> >> Let's do this at merge day to be
sure useless issues will
> > > not
> > > > > be
> > > > > > > > >> created.
> > > > > > > > >> >>
> > > > > > > > >> >> On Fri, Jun 28, 2019 at 2:01 PM
Вячеслав Коптилин <
> > > > > > > > >> >> slava.koptilin@gmail.com> wrote:
> > > > > > > > >> >>
> > > > > > > > >> >>> Hi Anton,
> > > > > > > > >> >>>
> > > > > > > > >> >>> I will take a look at your pull
request if you don't
> > mind.
> > > > > > > > >> >>>
> > > > > > > > >> >>> In any way, could you please
update the IEP page with
> > the
> > > > list
> > > > > > of
> > > > > > > > >> >>> constraints/limitations of the
proposed approach, TODOs,
> > > > etc?
> > > > > > > > >> >>> For instance, I would like to
see all these limitations
> > on
> > > > the
> > > > > > IEP
> > > > > > > > >> page
> > > > > > > > >> >>> as
> > > > > > > > >> >>> JIRA tickets. Perhaps, it would
be good to create an
> > > > > > epic/umbrella
> > > > > > > > >> ticket
> > > > > > > > >> >>> in order to track all activities
related to `Read
> > Repair`
> > > > > > feature.
> > > > > > > > >> >>>
> > > > > > > > >> >>> Thanks,
> > > > > > > > >> >>> S.
> > > > > > > > >> >>>
> > > > > > > > >> >>> чт, 20 июн. 2019 г. в
14:15, Anton Vinogradov <
> > > > av@apache.org
> > > > > >:
> > > > > > > > >> >>>
> > > > > > > > >> >>> > Igniters,
> > > > > > > > >> >>> > I'm glad to introduce Read
Repair feature [0] provides
> > > > > > > additional
> > > > > > > > >> >>> > consistency guarantee for
Ignite.
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > 1) Why we need it?
> > > > > > > > >> >>> > The detailed explanation
can be found at IEP-31 [1].
> > > > > > > > >> >>> > In short, because of bugs,
it's possible to gain an
> > > > > > inconsistent
> > > > > > > > >> state.
> > > > > > > > >> >>> > We need additional features
to handle this case.
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > Currently we able to check
cluster using Idle_verify
> > [2]
> > > > > > > feature,
> > > > > > > > >> but
> > > > > > > > >> >>> it
> > > > > > > > >> >>> > will not fix the data,
will not even tell which
> > entries
> > > > are
> > > > > > > > broken.
> > > > > > > > >> >>> > Read Repair is a feature
to understand which entries
> > are
> > > > > > broken
> > > > > > > > and
> > > > > > > > >> to
> > > > > > > > >> >>> fix
> > > > > > > > >> >>> > them.
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > 1) How it works?
> > > > > > > > >> >>> > IgniteCache now able to
provide special proxy [3]
> > > > > > > > withReadRepair().
> > > > > > > > >> >>> > This proxy guarantee that
data will be gained from all
> > > > > owners
> > > > > > > and
> > > > > > > > >> >>> compared.
> > > > > > > > >> >>> > In the case of consistency
violation situation, data
> > > will
> > > > be
> > > > > > > > >> recovered
> > > > > > > > >> >>> and
> > > > > > > > >> >>> > a special event recorded.
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > 3) Naming?
> > > > > > > > >> >>> > Feature name based on Cassandra's
Read Repair feature
> > > [4],
> > > > > > which
> > > > > > > > is
> > > > > > > > >> >>> pretty
> > > > > > > > >> >>> > similar.
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > 4) Limitations which can
be fixed in the future?
> > > > > > > > >> >>> >   * MVCC and Near caches
are not supported.
> > > > > > > > >> >>> >   * Atomic caches can be
checked (false positive case
> > is
> > > > > > > possible
> > > > > > > > on
> > > > > > > > >> >>> this
> > > > > > > > >> >>> > check), but can't be recovered.
> > > > > > > > >> >>> >   * Partial entry removal
can't be recovered.
> > > > > > > > >> >>> >   * Entries streamed using
data streamer (using not a
> > > > > > > "cache.put"
> > > > > > > > >> based
> > > > > > > > >> >>> > updater) and loaded by
cache.load
> > > > > > > > >> >>> >   are perceived as inconsistent
since they may have
> > > > > different
> > > > > > > > >> versions
> > > > > > > > >> >>> for
> > > > > > > > >> >>> > same keys.
> > > > > > > > >> >>> >   * Only explicit get operations
are supported
> > > > > (getAndReplace,
> > > > > > > > >> >>> getAndPut,
> > > > > > > > >> >>> > etc can be supported in
future).
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > 5) What's left?
> > > > > > > > >> >>> >   * SQL/ThinClient/etc
support.
> > > > > > > > >> >>> >   * Metrics (found/repaired).
> > > > > > > > >> >>> >   * Simple per-partition
recovery feature able to work
> > > in
> > > > > the
> > > > > > > > >> >>> background in
> > > > > > > > >> >>> > addition to per-entry recovery
feature.
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > 6) Is code checked?
> > > > > > > > >> >>> >   * Pull Request #5656
[5] (feature) - has green TC.
> > > > > > > > >> >>> >   * Pull Request #6575
[6] (RunAll with the feature
> > > > enabled
> > > > > > for
> > > > > > > > >> every
> > > > > > > > >> >>> get()
> > > > > > > > >> >>> > request) - has a limited
amount of failures (because
> > of
> > > > data
> > > > > > > > >> streamer,
> > > > > > > > >> >>> > cache.load, etc).
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > Thoughts?
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > [0]
> > https://issues.apache.org/jira/browse/IGNITE-10663
> > > > > > > > >> >>> > [1]
> > > > > > > > >> >>> >
> > > > > > > > >> >>> >
> > > > > > > > >> >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> > > > > > > > >> >>> > [2]
> > > > > > > > >> >>> >
> > > > > > > > >> >>> >
> > > > > > > > >> >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
> > > > > > > > >> >>> > [3]
> > > > > > > > >> >>> >
> > > > > > > > >> >>> >
> > > > > > > > >> >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
> > > > > > > > >> >>> > [4]
> > > > > > > > >> >>> >
> > > > > > > > >> >>> >
> > > > > > > > >> >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
> > > > > > > > >> >>> > [5] https://github.com/apache/ignite/pull/5656
> > > > > > > > >> >>> > [6] https://github.com/apache/ignite/pull/6575
> > > > > > > > >> >>> >
> > > > > > > > >> >>>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >



-- 
Best regards,
Ivan Pavlukhin

Mime
View raw message