ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Vinogradov ...@apache.org>
Subject Re: Read Repair (ex. Consistency Check) - review request #2
Date Fri, 19 Jul 2019 05:31:42 GMT
Ivan,

My code is always production-ready, any doubts? :)

Anyway, that's a good question.
It's a good case to mark feature experimental somehow while it's roadmap
not finished.
Was MVCC marked as experimental?
API can be marked as Experimental [1] since java 9, but we're not able to
use this annotation for now.

Anyway again, seems it depends on the release date will RR be marked as
experimental or not.
In case we'll finish roadmap before 2.8 release there will be no need for
such mark.
So, let's assume it's production-ready until another state is not confirmed
:)

[1] https://docs.oracle.com/javase/9/docs/api/jdk/jfr/Experimental.html

On Thu, Jul 18, 2019 at 4:56 PM Павлухин Иван <vololo100@gmail.com> wrote:

> Folks,
>
> Sorry, I was not very attentive. Could you please clarify whether (or
> not) this feature is currently in an experiment state? And a couple of
> obvious things:
> 1. If it is experimental it should be clear for a user (e.g. from javadoc).
> 2. If not all limitations should be described in javadocs and
> documentation.
>
> ср, 17 июл. 2019 г. в 14:59, Вячеслав Коптилин <slava.koptilin@gmail.com>:
> >
> > Hello Anton,
> >
> > It's not possible, currently, to fix atomic caches.
> > > You may only check the consistency. *And it's better than nothing*, I
> > > think.
> >
> > Fair enough.
> >
> > >> 3. IgniteConsistencyViolationException is absolutely useless. It does
> not
> > > >> provide any information about the issue and possible way to fix it.
> > > It means that some keys from your get operation are broken.
> > > IgniteConsistencyViolationException CAN be extended with a list of
> broken
> > > keys in the future.
> >
> > I think it SHOULD be extended with additional fields/methods in the same
> > way as CacheConsistencyViolationEvent
> >
> > >> Well, near caches are widely used and fully transactional, so I think
> it
> > > >> makes sense to support the feature for near caches too.
> > > As I told before, it will be nice to implement this in the future, but
> we
> > > have more important tasks for now.
> >
> > I do not insist that it must be done right now.
> >
> >
> > > >> For instance, I would like to see all these limitations on the IEP
> page
> > > as
> > > >> JIRA tickets. Perhaps, it would be good to create an epic/umbrella
> > > ticket
> > > >> in order to track all activities related to `Read Repair` feature.
> > > Let's do this at merge day to be sure useless issues will not be
> created.
> > >
> >
> > I am just trying to say that it is a good time to create tickets in order
> > to track all of that otherwise the chance that all these
> > limitations/improvements will not be addressed is very high.
> >
> > Thanks,
> > S.
> >
> > вт, 16 июл. 2019 г. в 09:07, Anton Vinogradov <av@apache.org>:
> >
> > > Svala,
> > >
> > > >> Could you please take a look at PR:
> > > Going to review today, thanks for attaching the bot visa.
> > >
> > > >> 1. Should I consider that my cluster is broken? There is no answer!
> The
> > > >> false-positive result is possible.
> > > That's a question about atomic nature.
> > > It's not impossible to lock atomic entry to perform the check.
> > > You should perform some attempts, it's your decision how many.
> > > By default, atomic RR performs 3 attempts, you may increase this by
> setting
> > > IGNITE_NEAR_GET_MAX_REMAPS or by just performing additional gets.
> > >
> > > >> 2. What should be done here in order to check/resolve the issue?
> > > Perhaps, I
> > > >> should restart a node (which one?), restart the whole cluster, put
> a new
> > > >> value...
> > > It's not possible, currently, to fix atomic caches.
> > > You may only check the consistency. And it's better than nothing, I
> think.
> > > We should find a way how to fix atomic consistency first.
> > > A possible strategy is to use ЕntryProcessor which will replace all
> owner's
> > > values with "latest" and do nothing in case newest (than latest) value
> > > found (opposite to preloading approach).
> > >
> > > >> 3. IgniteConsistencyViolationException is absolutely useless. It
> does
> > > not
> > > >> provide any information about the issue and possible way to fix it.
> > > It means that some keys from your get operation are broken.
> > > IgniteConsistencyViolationException CAN be extended with a list of
> broken
> > > keys in the future.
> > >
> > > >> It seems that transactional caches are covered much better.
> > > Correct.
> > > Tx caches consistency is more important that atomic consistency,
> that's why
> > > it was implemented first.
> > > BTW, AFAIK, atomics also were not fixed at 10078 [1].
> > >
> > > >> Well, near caches are widely used and fully transactional, so I
> think it
> > > >> makes sense to support the feature for near caches too.
> > > As I told before, it will be nice to implement this in the future, but
> we
> > > have more important tasks for now.
> > > The main goal was to cover tx caches, to be able to fix them in case
> of the
> > > real problem at production.
> > >
> > > Summarizing the roadmap,
> > > My goal now is to finish the tx case, now we have an issue with false
> > > positive consistency violation [2].
> > > Also, we're going to update Jepsen tests [3] with RR to ensure tx
> caches
> > > fixed.
> > > Next main goal is to use RR at TC checks [4], help with this issue are
> > > appreciated.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-10078
> > > [2] https://issues.apache.org/jira/browse/IGNITE-11973
> > > [3] https://issues.apache.org/jira/browse/IGNITE-11972
> > > [4] https://issues.apache.org/jira/browse/IGNITE-11971
> > >
> > >
> > > On Mon, Jul 15, 2019 at 4:51 PM Dmitriy Pavlov <dpavlov@apache.org>
> wrote:
> > >
> > > > Ok,  thank you
> > > >
> > > > пн, 15 июл. 2019 г., 16:46 Nikolay Izhikov <nizhikov@apache.org>:
> > > >
> > > > > I did the review.
> > > > >
> > > > > пн, 15 июля 2019 г., 16:15 Dmitriy Pavlov <dpavlov@apache.org>:
> > > > >
> > > > > > Igniters, who did a review of
> > > > > > https://issues.apache.org/jira/browse/IGNITE-10663 before the
> merge?
> > > > > I've
> > > > > > checked both PR   https://github.com/apache/ignite/pull/5656
> and
> > > > Issue,
> > > > > > and dev.list thread and didn't find any LGTM.
> > > > > >
> > > > > > Anton, since you've rejected lazy consensus in our process,
we
> have
> > > RTC
> > > > > in
> > > > > > that (core) module. So I'd like to know if the fix was covered
> by the
> > > > > > review.
> > > > > >
> > > > > > Because you're a committer, a reviewer can be non-committer.
So,
> who
> > > > was
> > > > > a
> > > > > > reviewer? Or was process ignored?
> > > > > >
> > > > > > пн, 15 июл. 2019 г. в 15:37, Вячеслав Коптилин
<
> > > > slava.koptilin@gmail.com
> > > > > >:
> > > > > >
> > > > > > > Hello Anton,
> > > > > > >
> > > > > > > > I'd like to propose you to provide fixes as a PR since
you
> have a
> > > > > > vision
> > > > > > > of how it should be made. I'll review them and merge shortly.
> > > > > > > Could you please take a look at PR:
> > > > > > > https://github.com/apache/ignite/pull/6689
> > > > > > >
> > > > > > > > Since your comments mostly about Javadoc (does this
mean
> that my
> > > > > > solution
> > > > > > > is so great that you ask me only to fix Javadocs :) ?),
> > > > > > > In my humble opinion, I would consider this feature as
> experimental
> > > > one
> > > > > > (It
> > > > > > > does not seem production-ready).
> > > > > > > Let me clarify this with the following simple example:
> > > > > > >
> > > > > > >     try {
> > > > > > >         atomicCache.withReadRepair().getAll(keys);
> > > > > > >     }
> > > > > > >     catch (CacheException e) {
> > > > > > >         // What should be done here from the end-user point
of
> > > view?
> > > > > > >     }
> > > > > > >
> > > > > > > 1. Should I consider that my cluster is broken? There is
no
> answer!
> > > > The
> > > > > > > false-positive result is possible.
> > > > > > > 2. What should be done here in order to check/resolve the
> issue?
> > > > > > Perhaps, I
> > > > > > > should restart a node (which one?), restart the whole cluster,
> put
> > > a
> > > > > new
> > > > > > > value...
> > > > > > > 3. IgniteConsistencyViolationException is absolutely useless.
> It
> > > does
> > > > > not
> > > > > > > provide any information about the issue and possible way
to
> fix it.
> > > > > > >
> > > > > > > It seems that transactional caches are covered much better.
> > > > > > >
> > > > > > > > Mostly agree with you, but
> > > > > > > > - MVCC is not production ready,
> > > > > > > > - not sure near support really required,
> > > > > > > > - metrics are better for monitoring, but the Event
is enough
> for
> > > my
> > > > > > wish
> > > > > > > to
> > > > > > > > cover AI with consistency check,
> > > > > > > > - do we really need Platforms and Thin Client support?
> > > > > > > Well, near caches are widely used and fully transactional,
so I
> > > think
> > > > > it
> > > > > > > makes sense to support the feature for near caches too.
> > > > > > > .Net is already aware of 'ReadRepair'. It seems to me,
that it
> can
> > > be
> > > > > > > easily supported for C++. I don't see a reason why it should
> not be
> > > > > done
> > > > > > :)
> > > > > > >
> > > > > > > > Do you mean per partition check and recovery? That's
a good
> idea,
> > > > > but I
> > > > > > > found it's not easy to imagine API to for such tool.
> > > > > > > Yep, perhaps it can be done on the idle cluster via
> `idle-verify`
> > > > > command
> > > > > > > with additional flag. Agreed, that this approach is not
the
> best
> > > one
> > > > > > > definitely.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > S.
> > > > > > >
> > > > > > > чт, 11 июл. 2019 г. в 09:53, Anton Vinogradov <av@apache.org>:
> > > > > > >
> > > > > > > > Slava,
> > > > > > > >
> > > > > > > > Thanks for your review first!
> > > > > > > >
> > > > > > > > >> Anyway, I left some comments in your pull-request
at
> github.
> > > > > Please
> > > > > > > take
> > > > > > > > a
> > > > > > > > >> look. The most annoying thing is poorly documented
code :(
> > > > > > > > Since your comments mostly about Javadoc (does this
mean
> that my
> > > > > > solution
> > > > > > > > is so great that you ask me only to fix Javadocs :)
?),
> > > > > > > > I'd like to propose you to provide fixes as a PR since
you
> have a
> > > > > > vision
> > > > > > > of
> > > > > > > > how it should be made. I'll review them and merge
shortly.
> > > > > > > >
> > > > > > > > >> By the way, is it required to add test related
to
> fail-over
> > > > > > scenarios?
> > > > > > > > The best check is to use RR at real code.
> > > > > > > > For example, I'm injecting RR now to the test with
concurrent
> > > > > > > modifications
> > > > > > > > and restarts [1].
> > > > > > > >
> > > > > > > > >> I just checked, the IEP page and I still
cannot find Jira
> > > > tickets
> > > > > > that
> > > > > > > > >> should cover existing limitations/improvements.
> > > > > > > > >> I would suggest creating the following tasks,
at least:
> > > > > > > > Mostly agree with you, but
> > > > > > > > - MVCC is not production ready,
> > > > > > > > - not sure near support really required,
> > > > > > > > - metrics are better for monitoring, but the Event
is enough
> for
> > > my
> > > > > > wish
> > > > > > > to
> > > > > > > > cover AI with consistency check,
> > > > > > > > - do we really need Platforms and Thin Client support?
> > > > > > > > Also, we should not produce stillborn issue.
> > > > > > > > All limitations listed at proxy creation method and
they
> > > definitely
> > > > > are
> > > > > > > not
> > > > > > > > showstoppers and may be fixed later if someone interested.
> > > > > > > > Сoming back to AI 3.0 discussion, we have A LOT of
features
> and
> > > > it's
> > > > > > > almost
> > > > > > > > impossible (require much more time that feature's
cost) to
> > > support
> > > > > them
> > > > > > > > all.
> > > > > > > > I will be pretty happy in case someone will do this
and
> provide
> > > > help
> > > > > if
> > > > > > > > necessary!
> > > > > > > >
> > > > > > > > >> Perhaps, it would be a good idea to think
about the
> recovery
> > > too
> > > > > > > > Do you mean per partition check and recovery?
> > > > > > > > That's a good idea, but I found it's not easy to imagine
API
> to
> > > for
> > > > > > such
> > > > > > > > tool.
> > > > > > > > In case you ready to assist with proper API/design
this will
> > > > > definitely
> > > > > > > > help.
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > > > > > > >
> > > > > > > > On Wed, Jul 10, 2019 at 3:43 PM Вячеслав Коптилин
<
> > > > > > > > slava.koptilin@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Perhaps, it would be a good idea to think about
the
> recovery
> > > > tool/
> > > > > > > > > control-utility command that will allow achieving
the same
> > > goal.
> > > > > > > > > If I am not mistaken it was already proposed
in the email
> > > thread.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > S.
> > > > > > > > >
> > > > > > > > > ср, 10 июл. 2019 г. в 15:33, Вячеслав
Коптилин <
> > > > > > > slava.koptilin@gmail.com
> > > > > > > > >:
> > > > > > > > >
> > > > > > > > > > Hi Anton,
> > > > > > > > > >
> > > > > > > > > > Well, the ReadRepair feature is finally
merged and that
> is
> > > good
> > > > > > news
> > > > > > > :)
> > > > > > > > > >
> > > > > > > > > > Unfortunately, I cannot find a consensus
about the whole
> > > > > > > functionality
> > > > > > > > in
> > > > > > > > > > any of these topics:
> > > > > > > > > >  -
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html
> > > > > > > > > >  -
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html
> > > > > > > > > > Also, there are no comments/discussion in
JIRA. That
> makes me
> > > > sad
> > > > > > :(
> > > > > > > > > > especially when a feature is huge, not obvious
and
> involves
> > > > > > changing
> > > > > > > > > public
> > > > > > > > > > API (and that is the case, I think).
> > > > > > > > > >
> > > > > > > > > > Anyway, I left some comments in your pull-request
at
> github.
> > > > > Please
> > > > > > > > take
> > > > > > > > > a
> > > > > > > > > > look. The most annoying thing is poorly
documented code
> :(
> > > > > > > > > > By the way, is it required to add test related
to
> fail-over
> > > > > > > scenarios?
> > > > > > > > > >
> > > > > > > > > > I just checked, the IEP page and I still
cannot find Jira
> > > > tickets
> > > > > > > that
> > > > > > > > > > should cover existing limitations/improvements.
> > > > > > > > > > I would suggest creating the following tasks,
at least:
> > > > > > > > > >  - MVCC support
> > > > > > > > > >  - Near caches
> > > > > > > > > >  - Additional metrics (number of violations,
number of
> > > repaired
> > > > > > > entries
> > > > > > > > > > etc)
> > > > > > > > > >  - Ignite C++ (It looks like, .Net is already
aware of
> that
> > > > > > feature)
> > > > > > > > > >  - Thin clients support
> > > > > > > > > >  - Perhaps, it would be useful to support
different
> > > strategies
> > > > to
> > > > > > > > resolve
> > > > > > > > > > inconsistencies
> > > > > > > > > >
> > > > > > > > > > What do you think?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > S.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ср, 10 июл. 2019 г. в 10:16, Anton
Vinogradov <
> av@apache.org
> > > >:
> > > > > > > > > >
> > > > > > > > > >> Folks,
> > > > > > > > > >>
> > > > > > > > > >> Thanks to everyone for tips and reviews.
> > > > > > > > > >> Yardstick checked, no performance drop
found.
> > > > > > > > > >> Additional measurement: RR get() is
just up to 7% slower
> > > than
> > > > > > > regular
> > > > > > > > > get
> > > > > > > > > >> on real benchmarks (8 clients, 4 servers,
3 backups)
> > > > > > > > > >> Code merged to the master.
> > > > > > > > > >> "Must have" tasks created and attached
to the IEP.
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Jul 4, 2019 at 12:18 PM Anton
Vinogradov <
> > > > av@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Folks,
> > > > > > > > > >> >
> > > > > > > > > >> > Just a minor update.
> > > > > > > > > >> >
> > > > > > > > > >> > RunAll [1] with enabled ReadRepair
proxy is almost
> green
> > > now
> > > > > > (~10
> > > > > > > > > tests
> > > > > > > > > >> > left, started with 6k :)).
> > > > > > > > > >> > During the analisys, I've found
some tests with
> > > > > > > > > >> > - unexpected repairs at tx caches
> > > > > > > > > >> > - inconsistent state after the
test finished
> (different
> > > > > entries
> > > > > > > > across
> > > > > > > > > >> the
> > > > > > > > > >> > topology)
> > > > > > > > > >> > For example,
> > > > > > > > > >> > - testInvokeAllAppliedOnceOnBinaryTypeRegistration
> > > generates
> > > > > > > > obsolete
> > > > > > > > > >> > versions on backups in case of
retry, fixed [2]
> > > > > > > > > >> > - initial cache load generates
not equal versions on
> > > > backups,
> > > > > > > fixed
> > > > > > > > > [3]
> > > > > > > > > >> > - testAccountTxNodeRestart causes
unexpected repairs
> > > > (entries
> > > > > > have
> > > > > > > > > >> > different versions), to be investigated.
> > > > > > > > > >> >
> > > > > > > > > >> > What's next?
> > > > > > > > > >> >
> > > > > > > > > >> > 1) Going to merge the solution
once "RunAll with
> > > ReadRepair
> > > > > > > enabled"
> > > > > > > > > >> > becomes fully green.
> > > > > > > > > >> > 2) Going to add special check after
each test which
> will
> > > > > ensure
> > > > > > > > caches
> > > > > > > > > >> > content after the test is consistent.
> > > > > > > > > >> > 2.1) The Same check can (should?)
be injected to
> > > > > > > > > >> > awaitPartitionMapExchange() and
similar methods.
> > > > > > > > > >> > 3) Update Jepsen tests with RR
checks.
> > > > > > > > > >> > 4) Introduce per partition RR.
> > > > > > > > > >> >
> > > > > > > > > >> > So, the final goal is to be sure
that Ignite produces
> only
> > > > > > > > consistent
> > > > > > > > > >> data
> > > > > > > > > >> > and to have a feature to solve
consistency in case we
> gain
> > > > > > > > > inconsistent
> > > > > > > > > >> > state somehow.
> > > > > > > > > >> >
> > > > > > > > > >> > Limitations?
> > > > > > > > > >> >
> > > > > > > > > >> > Currently, RR has some limitations,
but they are not
> > > related
> > > > > to
> > > > > > > real
> > > > > > > > > >> > production cases.
> > > > > > > > > >> > In case someone interested to support,
for example,
> MVCC
> > > or
> > > > > near
> > > > > > > > > caches,
> > > > > > > > > >> > please, feel free to contribute.
> > > > > > > > > >> >
> > > > > > > > > >> > [1]
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/6575/head&action=Latest
> > > > > > > > > >> > [2]
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/ignite/pull/5656/commits/6f6ec4434095e692af209c61833a350f3013408c
> > > > > > > > > >> > [3]
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/ignite/pull/5656/commits/255e552b474839e470c66a77e74e3c807bc76f13
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Jun 28, 2019 at 2:41 PM
Anton Vinogradov <
> > > > > av@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> >> Slava,
> > > > > > > > > >> >>
> > > > > > > > > >> >> >> I will take a look
at your pull request if you
> don't
> > > > mind.
> > > > > > > > > >> >> Great news!
> > > > > > > > > >> >>
> > > > > > > > > >> >> >> In any way, could
you please update the IEP page
> with
> > > > the
> > > > > > list
> > > > > > > > of
> > > > > > > > > >> >> >> constraints/limitations
of the proposed approach,
> > > TODOs,
> > > > > > etc?
> > > > > > > > > >> >> Not sure we should keep this
at IEP until list (#4
> from
> > > > > > original
> > > > > > > > > >> letter)
> > > > > > > > > >> >> is not confirmed.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Currently, I'm checking RunAll
with RR enabled to
> almost
> > > > each
> > > > > > get
> > > > > > > > > >> request.
> > > > > > > > > >> >> "Almost" means: readRepair
= !ctx.readThrough() &&
> > > > > > > > > >> >> ctx.config().getBackups() >
0 && !ctx.isNear() &&
> > > > > > > > !ctx.mvccEnabled()
> > > > > > > > > >> >> For now I have 60 failed tests
and amount decreasing.
> > > > > > > > > >> >>
> > > > > > > > > >> >> >> For instance, I would
like to see all these
> > > limitations
> > > > on
> > > > > > the
> > > > > > > > IEP
> > > > > > > > > >> >> page as
> > > > > > > > > >> >> >> JIRA tickets. Perhaps,
it would be good to create
> an
> > > > > > > > epic/umbrella
> > > > > > > > > >> >> ticket
> > > > > > > > > >> >> >> in order to track
all activities related to `Read
> > > > Repair`
> > > > > > > > feature.
> > > > > > > > > >> >> Let's do this at merge day
to be sure useless issues
> will
> > > > not
> > > > > > be
> > > > > > > > > >> created.
> > > > > > > > > >> >>
> > > > > > > > > >> >> On Fri, Jun 28, 2019 at 2:01
PM Вячеслав Коптилин <
> > > > > > > > > >> >> slava.koptilin@gmail.com>
wrote:
> > > > > > > > > >> >>
> > > > > > > > > >> >>> Hi Anton,
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> I will take a look at your
pull request if you don't
> > > mind.
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> In any way, could you please
update the IEP page
> with
> > > the
> > > > > list
> > > > > > > of
> > > > > > > > > >> >>> constraints/limitations
of the proposed approach,
> TODOs,
> > > > > etc?
> > > > > > > > > >> >>> For instance, I would like
to see all these
> limitations
> > > on
> > > > > the
> > > > > > > IEP
> > > > > > > > > >> page
> > > > > > > > > >> >>> as
> > > > > > > > > >> >>> JIRA tickets. Perhaps,
it would be good to create an
> > > > > > > epic/umbrella
> > > > > > > > > >> ticket
> > > > > > > > > >> >>> in order to track all activities
related to `Read
> > > Repair`
> > > > > > > feature.
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> Thanks,
> > > > > > > > > >> >>> S.
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> чт, 20 июн. 2019 г.
в 14:15, Anton Vinogradov <
> > > > > av@apache.org
> > > > > > >:
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> > Igniters,
> > > > > > > > > >> >>> > I'm glad to introduce
Read Repair feature [0]
> provides
> > > > > > > > additional
> > > > > > > > > >> >>> > consistency guarantee
for Ignite.
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > 1) Why we need it?
> > > > > > > > > >> >>> > The detailed explanation
can be found at IEP-31
> [1].
> > > > > > > > > >> >>> > In short, because
of bugs, it's possible to gain
> an
> > > > > > > inconsistent
> > > > > > > > > >> state.
> > > > > > > > > >> >>> > We need additional
features to handle this case.
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > Currently we able
to check cluster using
> Idle_verify
> > > [2]
> > > > > > > > feature,
> > > > > > > > > >> but
> > > > > > > > > >> >>> it
> > > > > > > > > >> >>> > will not fix the data,
will not even tell which
> > > entries
> > > > > are
> > > > > > > > > broken.
> > > > > > > > > >> >>> > Read Repair is a feature
to understand which
> entries
> > > are
> > > > > > > broken
> > > > > > > > > and
> > > > > > > > > >> to
> > > > > > > > > >> >>> fix
> > > > > > > > > >> >>> > them.
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > 1) How it works?
> > > > > > > > > >> >>> > IgniteCache now able
to provide special proxy [3]
> > > > > > > > > withReadRepair().
> > > > > > > > > >> >>> > This proxy guarantee
that data will be gained
> from all
> > > > > > owners
> > > > > > > > and
> > > > > > > > > >> >>> compared.
> > > > > > > > > >> >>> > In the case of consistency
violation situation,
> data
> > > > will
> > > > > be
> > > > > > > > > >> recovered
> > > > > > > > > >> >>> and
> > > > > > > > > >> >>> > a special event recorded.
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > 3) Naming?
> > > > > > > > > >> >>> > Feature name based
on Cassandra's Read Repair
> feature
> > > > [4],
> > > > > > > which
> > > > > > > > > is
> > > > > > > > > >> >>> pretty
> > > > > > > > > >> >>> > similar.
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > 4) Limitations which
can be fixed in the future?
> > > > > > > > > >> >>> >   * MVCC and Near
caches are not supported.
> > > > > > > > > >> >>> >   * Atomic caches
can be checked (false positive
> case
> > > is
> > > > > > > > possible
> > > > > > > > > on
> > > > > > > > > >> >>> this
> > > > > > > > > >> >>> > check), but can't
be recovered.
> > > > > > > > > >> >>> >   * Partial entry
removal can't be recovered.
> > > > > > > > > >> >>> >   * Entries streamed
using data streamer (using
> not a
> > > > > > > > "cache.put"
> > > > > > > > > >> based
> > > > > > > > > >> >>> > updater) and loaded
by cache.load
> > > > > > > > > >> >>> >   are perceived as
inconsistent since they may
> have
> > > > > > different
> > > > > > > > > >> versions
> > > > > > > > > >> >>> for
> > > > > > > > > >> >>> > same keys.
> > > > > > > > > >> >>> >   * Only explicit
get operations are supported
> > > > > > (getAndReplace,
> > > > > > > > > >> >>> getAndPut,
> > > > > > > > > >> >>> > etc can be supported
in future).
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > 5) What's left?
> > > > > > > > > >> >>> >   * SQL/ThinClient/etc
support.
> > > > > > > > > >> >>> >   * Metrics (found/repaired).
> > > > > > > > > >> >>> >   * Simple per-partition
recovery feature able to
> work
> > > > in
> > > > > > the
> > > > > > > > > >> >>> background in
> > > > > > > > > >> >>> > addition to per-entry
recovery feature.
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > 6) Is code checked?
> > > > > > > > > >> >>> >   * Pull Request #5656
[5] (feature) - has green
> TC.
> > > > > > > > > >> >>> >   * Pull Request #6575
[6] (RunAll with the
> feature
> > > > > enabled
> > > > > > > for
> > > > > > > > > >> every
> > > > > > > > > >> >>> get()
> > > > > > > > > >> >>> > request) - has a limited
amount of failures
> (because
> > > of
> > > > > data
> > > > > > > > > >> streamer,
> > > > > > > > > >> >>> > cache.load, etc).
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > Thoughts?
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> > [0]
> > > https://issues.apache.org/jira/browse/IGNITE-10663
> > > > > > > > > >> >>> > [1]
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
> > > > > > > > > >> >>> > [2]
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
> > > > > > > > > >> >>> > [3]
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
> > > > > > > > > >> >>> > [4]
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
> > > > > > > > > >> >>> > [5] https://github.com/apache/ignite/pull/5656
> > > > > > > > > >> >>> > [6] https://github.com/apache/ignite/pull/6575
> > > > > > > > > >> >>> >
> > > > > > > > > >> >>>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message