From dev-return-32980-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Thu Apr 5 21:32:51 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 397E318064F for ; Thu, 5 Apr 2018 21:32:50 +0200 (CEST) Received: (qmail 9081 invoked by uid 500); 5 Apr 2018 19:32:49 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 8972 invoked by uid 99); 5 Apr 2018 19:32:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Apr 2018 19:32:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 007EFC0454 for ; Thu, 5 Apr 2018 19:32:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.211 X-Spam-Level: **** X-Spam-Status: No, score=4.211 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id GxvCKznR4h9g for ; Thu, 5 Apr 2018 19:32:44 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id BF96B5F3BA for ; Thu, 5 Apr 2018 19:32:43 +0000 (UTC) Received: by mail-io0-f179.google.com with SMTP id d7so31933580ioc.11 for ; Thu, 05 Apr 2018 12:32:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=bGxMS/5xPmvN3E9OHsUJukfxr+z9VQ1j9hK2IZUjw6E=; b=bH8nG2KykxdoDgjDbgrJCG83OizNXRtEhGQP6SIyGMpJ8FTm9WQL1WC8ftYak8dxhb ds2AnZjB1z/CViUQinIOnl3oxIMB9ygLuIbWrsN+EiyVqPiNk9QORQclD0F6chzLcs9s GcCEv2zWsUjXALTcn1k0o5JvC+xkdJsfYkBzGJ0Yg4COa08327uBeOAh2YGzqdweQX+W eM1jtVmIqLhNb1hwlSzk7VQog5R4+x5rcmABReD+TuqYkNcOh8w7DvVK9+xAaCNijaMc 3BRUo2Nwpo+NrESK8Ks9okEbxUEWA9simUpo0Gw/w190BcHKnEvsjD1kQVE07wui2Q1W b7dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=bGxMS/5xPmvN3E9OHsUJukfxr+z9VQ1j9hK2IZUjw6E=; b=Q9u9xB7cg1AO3fobvxgFoSz0mOuPBmymjIdR9w9fYsQ7kGsAsYdJh2ZQx15JjBTP11 jAxhCXlPCcq1T5G/OWogp+vito68mSb0mwf1q05vRdEpB/HooOwDrrytpCotNE0KolEi DRdWGdsnuhZFzv9tUDfLYZFt2UWaId8pfJKZtjBIGJ+stdvy//Enzwb0WK2Fy6EWrzWh ZmONOnAILGJJc3DpOLUjmB09fg0aXi7oILyxsfZiee6dVEI/WMEt8S0PLLjJBs3lo8w3 XWE7rbyhpgtSboUnjZqvPrkswRwtyRlNIDF5GFbs1CWkWOlGXsAkBEWQ5wlkb2nfJNT5 CDhQ== X-Gm-Message-State: ALQs6tBeXaENuEASLbtyPfuj99gR7YdlOEBPmKH/YZMj1g5KZfDJ8Tc3 QF2xTxrsPamnmM17M7mumGx99naNTOdNPZYHKF0= X-Google-Smtp-Source: AIpwx4/CXGj69GlCxZ1kAy5hqV/z5biwr8qSVY7jeNCPVp5lEQ3b6zz53+gLRY3+9+3hpL5JY3kxpTKnim+r6dOGfNw= X-Received: by 10.107.51.207 with SMTP id z198mr23041629ioz.112.1522956762880; Thu, 05 Apr 2018 12:32:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.169.96 with HTTP; Thu, 5 Apr 2018 12:32:12 -0700 (PDT) In-Reply-To: References: From: Valentin Kulichenko Date: Thu, 5 Apr 2018 12:32:12 -0700 Message-ID: Subject: Re: Service grid redesign To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="001a11441a96dc20d905691f0019" --001a11441a96dc20d905691f0019 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Denis, This is why I'm suggesting to use DeploymentSpi for this. The way I see this is that instead of deploying classes on local classpath, user can deploy them in the storage that SPI points to. If class is updated in the storage, Ignite detects this and automatically restarts the service. This is a very simple and straightforward approach that doesn't required a lot of changes on our side and allows to reuse existing implementation of DeploymentSpi. -Val On Thu, Apr 5, 2018 at 12:13 PM, Denis Magda wrote: > > > > There is no need to deserialize services on the coordinator. It should > only > > be able to calculate the assignments. > > *LazyServiceConfiguration *should be used to deliver the service > > configurations, just like it is done right now. > > > Can that configuration be tweaked over the time requiring to update the > class on all the nodes (if, for instance, someone wants to deploy the nex= t > version of a service)? Just want to be sure we don't need to restart the > cluster nodes (that won't be used for service deployments) on > services-related configurational changes. > > -- > Denis > > On Thu, Apr 5, 2018 at 8:18 AM, Denis Mekhanikov > wrote: > > > Denis, > > There is no need to deserialize services on the coordinator. It should > only > > be able to calculate the assignments. > > *LazyServiceConfiguration *should be used to deliver the service > > configurations, just like it is done right now. > > > > Val, > > Usage of DeploymentSpi is a good idea, I didn't think about this > > possibility. > > This is a viable alternative to peer-class-loading, not that > user-friendly > > though. > > But if peer-class-loading is that hard to implement, then I vote for > > DeploymentSpi. > > As far as I understand, it won't require us to do any additional change= s > in > > Ignite, but will make users think about using a proper DeploymentSpi. > > Please correct me, if I'm wrong. > > It would be good, though, to add some examples on service redeployment, > > when implementation class changes. > > > > Denis > > > > =D1=87=D1=82, 5 =D0=B0=D0=BF=D1=80. 2018 =D0=B3. =D0=B2 2:33, Valentin = Kulichenko < > > valentin.kulichenko@gmail.com>: > > > > > I don't think peer class loading is even possible for services. I > believe > > > we should reuse DeploymentSpi [1] for versioning. > > > > > > [1] https://apacheignite.readme.io/docs/deployment-spi > > > > > > -Val > > > > > > On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda > > wrote: > > > > > > > Sorry, that was me who renamed the IEP to "Oil Change in Service > Grid". > > > Was > > > > writing this email after the renaming. Like that title more because > > it's > > > > fun and highlights what we're intended to do - cleaning of our > service > > > grid > > > > engine and powering it up with new "liquid" (new communication and > > > > deployment approach not available before). > > > > > > > > Denis > > > > > > > > > > > > > This message contains serialized service instance and its > > > configuration. > > > > > It is delivered to the coordinator node first, that calculates th= e > > > > service > > > > > deployment assignments and adds this information to the message. > > > > > > > > > > > > I would consider using a NodeFilter first to decide where a service > can > > > be > > > > potentially deployed. Otherwise, we would require service classes = to > > be > > > on > > > > every node (every node might become a coordinator) which is not the > > > desired > > > > requirement. > > > > > > > > > > > > As for the peer-class-loading, I would backup up Dmitriy here. Let'= s > at > > > > least not to focus on this task for now. We should design services > > > > versioning in the right way first and support it. > > > > > > > > -- > > > > Denis > > > > > > > > > > > > > > > > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan < > > > dsetrakyan@apache.org> > > > > wrote: > > > > > > > > > Here is the correct link: > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > > > > > 17%3A+Oil+Change+in+Service+Grid > > > > > > > > > > I have looked at the tickets there, and I believe that we should > not > > > > > support peer-deployment for services. It is very hard and I do no= t > > > think > > > > we > > > > > should even try. > > > > > > > > > > I am proposing closing this ticket as Won't Fix - > > > > > https://issues.apache.org/jira/browse/IGNITE-975 > > > > > > > > > > D. > > > > > > > > > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov < > > > dmekhanikov@gmail.com> > > > > > wrote: > > > > > > > > > > > Vyacheslav, > > > > > > > > > > > > I've just posted my first draft of the IEP: > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > > > > > 17%3A+Service+grid+ > > > > > > improvements > > > > > > It's not finished yet, but you can get the idea from it. > > > > > > If you have some thoughts on your mind, please let me know, I'l= l > > add > > > > them > > > > > > to the IEP. > > > > > > > > > > > > Denis > > > > > > > > > > > > =D1=81=D1=80, 4 =D0=B0=D0=BF=D1=80. 2018 =D0=B3. =D0=B2 13:09, = Vyacheslav Daradur < > > daradurvs@gmail.com > > > >: > > > > > > > > > > > > > Denis, thanks for the link. > > > > > > > > > > > > > > I looked through the task and I think that understand your > > redesign > > > > > point > > > > > > > now. > > > > > > > > > > > > > > Do you have a clear plan or IEP for the whole redesign? > > > > > > > > > > > > > > I'm interested in this component and I'd like to take part in > the > > > > > > > development. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov < > > > > > dmekhanikov@gmail.com> > > > > > > > wrote: > > > > > > > > Vyacheslav, > > > > > > > > > > > > > > > > Service deployment design, based on replicated utility cach= e > > has > > > > > proven > > > > > > > to > > > > > > > > be unstable and deadlock-prone. > > > > > > > > You can find a list of JIRA issues, connected to it, in my > > > previous > > > > > > > letter. > > > > > > > > > > > > > > > > The intention behind it is similar to the binary metadata > > > redesign, > > > > > > that > > > > > > > > happened in the following ticket: IGNITE-4157 > > > > > > > > > > > > > > > > This change in service deployment procedure will eliminate > need > > > for > > > > > > > another > > > > > > > > internal replicated cache > > > > > > > > and make service deployment more reliable on unstable > topology. > > > > > > > > > > > > > > > > Denis > > > > > > > > > > > > > > > > =D0=B2=D1=82, 27 =D0=BC=D0=B0=D1=80. 2018 =D0=B3. =D0=B2 23= :21, Vyacheslav Daradur < > > > > daradurvs@gmail.com > > > > > >: > > > > > > > > > > > > > > > >> Hi, Denis Mekhanikov! > > > > > > > >> > > > > > > > >> As far as I know, Ignite services are based on IgniteCache > and > > > we > > > > > have > > > > > > > >> all its features. We can use listeners or continuous queri= es > > for > > > > > > > >> deployment synchronizations. > > > > > > > >> > > > > > > > >> Why do you want using the discovery layer for that? > > > > > > > >> > > > > > > > >> One more thing: we can use baseline approach for services, > > that > > > > > means > > > > > > > >> *IgniteService.deploy()* returns ready to work service aft= er > > > > > > > >> deployment on baseline nodes and deploy to other nodes on > > > demand, > > > > > for > > > > > > > >> example when deployed service's loading will be hight. > > > > > > > >> > > > > > > > >> About versioning, maybe there is sense to extend public AP= I: > > > > > > > >> IgniteServices.service(name, *version*)? > > > > > > > >> > > > > > > > >> At first deployment, we can compute service's hashcode (ju= st > > for > > > > an > > > > > > > >> example) and store it, after new deployment request for > > services > > > > > with > > > > > > > >> an existing name we will compute new service's hashcode an= d > > > > compare > > > > > > > >> them if they have different hashcodes that we will deploy > new > > > > > service > > > > > > > >> as service with a different version. > > > > > > > >> > > > > > > > >> > > > > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda < > > > dmagda@apache.org> > > > > > > > wrote: > > > > > > > >> > Denis, > > > > > > > >> > > > > > > > > >> > Thanks for the extensive analysis. There is a vast room > for > > > > > > > optimizations > > > > > > > >> > on the service grid side. > > > > > > > >> > > > > > > > > >> > Yakov, Sam, Alex G., > > > > > > > >> > > > > > > > > >> > How do you like the idea of the usage of discovery > protocol > > > for > > > > > the > > > > > > > >> service > > > > > > > >> > grid system messages exchange? Any pitfalls? > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > -- > > > > > > > >> > Denis > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov < > > > > > > > dmekhanikov@gmail.com > > > > > > > >> > > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > >> >> Igniters, > > > > > > > >> >> > > > > > > > >> >> I'd like to start a discussion on Ignite service grid > > > redesign. > > > > > > > >> >> We have a number of problems in our current architectur= e, > > > that > > > > > have > > > > > > > to > > > > > > > >> be > > > > > > > >> >> addressed. > > > > > > > >> >> > > > > > > > >> >> Here are the most severe ones: > > > > > > > >> >> > > > > > > > >> >> One of them is lack of guarantee, that service is > > > successfully > > > > > > > deployed > > > > > > > >> and > > > > > > > >> >> ready for work by the time, when > *IgniteService.deploy*()* > > > > > methods > > > > > > > >> return. > > > > > > > >> >> Furthermore, if an exception is thrown from > *Service.init() > > > > > > *method, > > > > > > > >> then > > > > > > > >> >> the deploying side is not able to receive it, or even > > > > understand, > > > > > > > that > > > > > > > >> >> service is in unusable state. > > > > > > > >> >> So, you may end up in such situation, when you deployed= a > > > > service > > > > > > > >> without > > > > > > > >> >> receiving any errors, then called a service's method, a= nd > > > hung > > > > > > > >> indefinitely > > > > > > > >> >> on this invocation. > > > > > > > >> >> JIRA ticket: > > > https://issues.apache.org/jira/browse/IGNITE-3392 > > > > > > > >> >> > > > > > > > >> >> Another problem is locking during service deployment on > > > > unstable > > > > > > > >> topology. > > > > > > > >> >> This issue is caused by missing updates in continuous > query > > > > > > > listeners on > > > > > > > >> >> the internal cache. > > > > > > > >> >> It is hard to reproduce, but it happens sometimes. We > > > shouldn't > > > > > > allow > > > > > > > >> such > > > > > > > >> >> possibility, that deployment methods hang without sayin= g > > > > > anything. > > > > > > > >> >> JIRA ticket: > > > https://issues.apache.org/jira/browse/IGNITE-6259 > > > > > > > >> >> > > > > > > > >> >> I think, we should change the deployment procedure to > make > > it > > > > > more > > > > > > > >> >> reliable. > > > > > > > >> >> Moving from operating over internal replicated service > > cache > > > to > > > > > > > sending > > > > > > > >> >> custom discovery events seems to be a good idea. > > > > > > > >> >> Service deployment may trigger a discovery event, that > will > > > > make > > > > > > > chosen > > > > > > > >> >> nodes deploy the service, and the same event will notif= y > > > other > > > > > > nodes > > > > > > > >> about > > > > > > > >> >> the deployed service instances. > > > > > > > >> >> It will eliminate the need for distributed transactions > on > > > the > > > > > > > internal > > > > > > > >> >> replicated system cache, and make the service deploymen= t > > > > protocol > > > > > > > more > > > > > > > >> >> transparent. > > > > > > > >> >> > > > > > > > >> >> There are a few points, that should be taken into accou= nt > > > > though. > > > > > > > >> >> > > > > > > > >> >> First of all, we can't wait for services to be deployed > and > > > > > > > initialised > > > > > > > >> in > > > > > > > >> >> the discovery thread. > > > > > > > >> >> So, we need to make notification about service deployme= nt > > > > result > > > > > > > >> >> asynchronous, presumably over communication protocol. > > > > > > > >> >> I can think of a procedure similar to the current > exchange > > > > > > protocol, > > > > > > > >> when > > > > > > > >> >> service deployment is initialised with an initial > discovery > > > > > > message, > > > > > > > >> >> followed by asynchronous notifications from the hosting > > > servers > > > > > > over > > > > > > > >> >> communication. And finally, one more discovery message > will > > > > > notify > > > > > > > all > > > > > > > >> >> nodes about the service deployment result and location = of > > the > > > > > > > deployed > > > > > > > >> >> service instances. Coordinator will be responsible for > > > > collecting > > > > > > of > > > > > > > the > > > > > > > >> >> deployment results in this scheme. > > > > > > > >> >> > > > > > > > >> >> Another problem is failover in case, when some nodes fa= il > > > > during > > > > > > > >> deployment > > > > > > > >> >> or further work. > > > > > > > >> >> The following cases should be handled: > > > > > > > >> >> > > > > > > > >> >> 1. coordinator failure during deployment; > > > > > > > >> >> 2. failure of nodes, that were chosen to host the > > service, > > > > > > during > > > > > > > >> >> deployment; > > > > > > > >> >> 3. failure of nodes, that contain deployed services, > > after > > > > the > > > > > > > >> >> deployment. > > > > > > > >> >> > > > > > > > >> >> The first case may be resolved by either continuation o= f > > > > > deployment > > > > > > > >> with a > > > > > > > >> >> new coordinator, or by cancelling it. > > > > > > > >> >> The second case will require another node to be chosen > and > > > > > > notified. > > > > > > > >> Maybe > > > > > > > >> >> another discovery message will be needed. > > > > > > > >> >> The third case will require redeployment, so coordinato= r > > > should > > > > > > track > > > > > > > >> >> topology changes and redeploy failed services. > > > > > > > >> >> > > > > > > > >> >> Another good improvement would be service versioning. > This > > > > matter > > > > > > was > > > > > > > >> >> already discussed in another thread: > > > > > > > >> >> > > > > > > > >> > > > > > > > http://apache-ignite-developers.2346864.n4.nabble. > > > > > > com/Service-versioning- > > > > > > > >> >> td20858.html > > > > > > > >> >> Let's resume this discussion and state the final decisi= on > > > here. > > > > > > > >> >> This feature is closely connected to peer class loading= , > > > which > > > > is > > > > > > not > > > > > > > >> >> working for services currently. > > > > > > > >> >> So, service versioning should be implemented along with > > peer > > > > > class > > > > > > > >> loading. > > > > > > > >> >> JIRA ticket for versioning: > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069 > > > > > > > >> >> Peer class loading: https://issues.apache.org/ > > > > > > jira/browse/IGNITE-975 > > > > > > > >> >> > > > > > > > >> >> Please share your thoughts. Constructive criticism is > > highly > > > > > > > >> appreciated. > > > > > > > >> >> > > > > > > > >> >> Denis > > > > > > > >> >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> -- > > > > > > > >> Best Regards, Vyacheslav D. > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best Regards, Vyacheslav D. > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a11441a96dc20d905691f0019--