ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Daradur <daradu...@gmail.com>
Subject Re: Discovery-based services deployment guarantees question
Date Mon, 30 Dec 2019 10:00:27 GMT
Alexey,

I would not make it default in the current implementation.

Waiting of proxies on non-deployment-initiator nodes should be
improved - additional checks are required:
1) We should not wait if requested service has not been submitted to
deploy (when there is no info about such service)
2) If service deployment failed - getting proxy should be failed or
interrupted as well (do not wait for all available timeout)

Let's schedule this improvement to next release, I'll try to find a
time to implement it.

What do you think?

On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk
<alexey.goncharuk@gmail.com> wrote:
>
> Vyacheslav, thanks for the explanation, makes sense to me.
>
> I was thinking though, should we make the behavior with the timeout default
> for all proxies?
>
> Just my opinion - I think for a user it would be hard to control which node
> deploys the service, especially if multiple nodes deploy it concurrently.
> Most likely users will end up always calling the second option of the proxy
> (with the timeout), so, perhaps, make it default?
>
> вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <daradurvs@gmail.com>:
>
> > Alexey,
> >
> > I've prepared pr [1] to show our proxy invocation guarantees and to
> > avoid misunderstanding.
> >
> > Please, let me know if you think that we should improve our guaranties
> > in some cases.
> >
> > [1] https://github.com/apache/ignite/pull/7213
> >
> > On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <daradurvs@gmail.com>
> > wrote:
> > >
> > > > even the local deployment looks broken: if a compute job
> > > > is sent to a remote node after the service deployment
> > >
> > > This is a different case and covered by retries:
> > > * If you deploy a service from node A to node B, then take a proxy
> > > from node A (deployment initiator) it should NOT fail even if node B
> > > has not received yet a message that deployment finished successfully,
> > > because of proxy invocation retries.
> > >
> > > Look like It's better to describe all these cases on the wiki.
> > >
> > > > Should we schedule this ticket for the further work on Services IEP?
> > >
> > > If it is a frequent use-case we definitely should implement it.
> > >
> > >
> > > On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
> > > <alexey.goncharuk@gmail.com> wrote:
> > > >
> > > > Ok, got it.
> > > >
> > > > I agree that this is consistent with the old behavior, but this is the
> > kind
> > > > of errors we wanted to get rid of when we started the IEP. From the
> > > > user perspective, even the local deployment looks broken: if a compute
> > job
> > > > is sent to a remote node after the service deployment, the job
> > execution
> > > > may fail due to this error.
> > > >
> > > > Should we schedule this ticket for the further work on Services IEP?
> > > >
> > > > вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <daradurvs@gmail.com>:
> > > >
> > > > > Not sure that "user fallback" is the right definition, it is not
new
> > > > > behaviour in comparison with legacy implementation.
> > > > >
> > > > > Our synchronous deployment provides guaranties for a deployment
> > > > > initiator to be able to start work with service immediately after
> > > > > deployment finished successfully.
> > > > > For not the deployment initiator we can't provide such guarantees
> > now,
> > > > > because of unknown deployment result and possibly fail.
> > > > >
> > > > > In this case, a reasonable timeout might be an acceptable solution.
> > > > >
> > > > > We can improve guaranties in future releases, but there is an open
> > > > > question:
> > > > > - how long taking of proxy should wait? - deployment of "heavy"
> > > > > service may take a while
> > > > >
> > > > > On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> > > > > <alexey.goncharuk@gmail.com> wrote:
> > > > > >
> > > > > > What should be the user fallback in this case? Retry infinitely?
Is
> > > > > there a
> > > > > > way to wait for the proper deployment?
> > > > > >
> > > > > > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
> > daradurvs@gmail.com>:
> > > > > >
> > > > > > > I’ll take a look at the end of the week.
> > > > > > >
> > > > > > > There is one more use-case:
> > > > > > > * if you initiate deployment from node A, but getting proxy
on
> > node B
> > > > > > > (which isn’t deployment initiator) to call service on
node A -
> > it may
> > > > > fail
> > > > > > > with "service not found", this is expected behaviour because
we
> > didn't
> > > > > > > provide such guarantees.
> > > > > > >
> > > > > > > API of getting proxy with timeout should be used in this
case:
> > > > > > > T serviceProxy(String name, Class<? super T> svcItf,
boolean
> > sticky,
> > > > > long
> > > > > > > timeout)
> > > > > > >
> > > > > > >
> > > > > > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> > > > > alexey.goncharuk@gmail.com
> > > > > > > >:
> > > > > > >
> > > > > > > > Well, this is exactly the case. The service is deployed
from
> > node A,
> > > > > the
> > > > > > > > proxy is created on node B, and "service not found"
exception
> > gets
> > > > > thrown
> > > > > > > > to a user anyway. Perhaps, the retry happens too fast?
> > > > > > > >
> > > > > > > > Created a ticket [1].
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > > > > > >
> > > > > > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur
<
> > daradurvs@gmail.com
> > > > > >:
> > > > > > > >
> > > > > > > > > Hi, Alexey
> > > > > > > > >
> > > > > > > > > Please attach a reproducer to the ticket.
> > > > > > > > >
> > > > > > > > > As far as I remember we have the following behaviour
for the
> > > > > proxies:
> > > > > > > > >
> > > > > > > > > Let's assume you have deployed service from node
A, then:
> > > > > > > > > * if you invoke service locally from node A -
it is
> > guaranteed to
> > > > > > > > > service to be deployed and ready to work
> > > > > > > > > * if you take a proxy from node A to remote node
B right
> > after
> > > > > deploy
> > > > > > > > > - there is might be a race between disco-spi
(a message which
> > > > > releases
> > > > > > > > > deployed service)  and comm-spi (remote call
works via
> > Compute over
> > > > > > > > > comm-spi), but it shouldn't affect end-users
because the
> > failed
> > > > > > > > > request will be retried in this case
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > > > > > <alexey.goncharuk@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Nikolay,
> > > > > > > > > >
> > > > > > > > > > Yes, I've rechecked, the new service processor
is being
> > used.
> > > > > I'll
> > > > > > > > file a
> > > > > > > > > > bug shortly.
> > > > > > > > > >
> > > > > > > > > > пн, 23 дек. 2019 г. в 17:33, Николай
Ижиков <
> > nizhikov@apache.org
> > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Alexey, are you sure, you are testing
new service
> > framework?
> > > > > > > > > > >
> > > > > > > > > > > Is yes - you definitely should file
a bug.
> > > > > > > > > > >
> > > > > > > > > > > > 23 дек. 2019 г., в 17:02,
Alexey Goncharuk <
> > > > > > > > > alexey.goncharuk@gmail.com>
> > > > > > > > > > > написал(а):
> > > > > > > > > > > >
> > > > > > > > > > > > Igniters,
> > > > > > > > > > > >
> > > > > > > > > > > > I have a question based on one
of my recent tests
> > debugging.
> > > > > > > > > > > >
> > > > > > > > > > > > The test is related to Ignite
services. I noticed that
> > > > > sometimes
> > > > > > > a
> > > > > > > > > proxy
> > > > > > > > > > > > invocation of a newly deployed
service fails because
> > the
> > > > > service
> > > > > > > > > cannot
> > > > > > > > > > > be
> > > > > > > > > > > > found. I managed to reduce the
test to a simple "start
> > two
> > > > > nodes,
> > > > > > > > > deploy
> > > > > > > > > > > a
> > > > > > > > > > > > service, create a proxy, invoke
the proxy" scenario.
> > The
> > > > > proxy
> > > > > > > > > invocation
> > > > > > > > > > > > fails in about ~80% of runs.
> > > > > > > > > > > >
> > > > > > > > > > > > As far as I remember, the new
discovery-based service
> > > > > deployment
> > > > > > > > was
> > > > > > > > > > > > supposed to be synchronous, so
not only non-proxy
> > service
> > > > > > > instances
> > > > > > > > > > > should
> > > > > > > > > > > > work, but the proxies as well.
Was my understanding
> > correct?
> > > > > > > > Should I
> > > > > > > > > > > file
> > > > > > > > > > > > a bug for the observed behavior?
> > > > > > > > > > > >
> > > > > > > > > > > > --AG
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >



-- 
Best Regards, Vyacheslav D.

Mime
View raw message