ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladisav Jelisavcic <vladis...@gmail.com>
Subject Re: IGNITE-4155 IgniteSemaphoreExample unexpected behavior
Date Fri, 11 Nov 2016 06:27:24 GMT
Hi Sergey,

thanks for finding and submitting this bug!

Best regards,
Vladisav

On Thu, Nov 10, 2016 at 1:46 PM, Sergey Chugunov <sergey.chugunov@gmail.com>
wrote:

> Hello Vladisav,
>
> Thanks for confirmation!
>
> I created a JIRA <https://issues.apache.org/jira/browse/IGNITE-4209> to
> track this issue, feel free to edit it if it isn't descriptive enough.
>
> Thank you,
> Sergey.
>
> On Thu, Nov 10, 2016 at 9:44 AM, Vladisav Jelisavcic <vladisavj@gmail.com>
> wrote:
>
> > Hi Sergey,
> >
> > you are right - I can reproduce this also.
> > It seems to me that this is caused because we treat the same both
> > EVT_NODE_LEFT and EVT_NODE_FAILED events.
> > In this case, node leaves the topology without failure, but fails to
> > release the semaphore before EVT_NODE_LEFT event occurs on other nodes,
> > this really is a bug.
> >
> > Thanks!
> > Vladisav
> >
> > On Wed, Nov 9, 2016 at 11:23 AM, Sergey Chugunov <
> > sergey.chugunov@gmail.com>
> > wrote:
> >
> > > Hello Vladisav,
> > >
> > > I found this behavior in a very simple environment: I had two nodes on
> my
> > > local machine started by *ExampleNodeStartup* class and another node
> > > started with *IgniteSemaphoreExample* class.
> > >
> > > No modifications were made to any code or configuration and I used
> latest
> > > version of code available in master branch.
> > > No node failures occurred during test execution as well.
> > >
> > > As far as I understood from short investigation synchronization
> semaphore
> > > of name "IgniteSemaphoreExample" goes to broken state when
> > > *IgniteSemaphoreExample* node normally finishes and disconnects from
> the
> > > cluster.
> > > After that reusing of this semaphore becomes impossible and leads to
> > > hanging of new nodes doing so.
> > >
> > > Can you reproduce this? If so I will submit a ticket and share with
> you.
> > >
> > > Thank you,
> > > Sergey.
> > >
> > >
> > > On Wed, Nov 9, 2016 at 10:55 AM, Vladisav Jelisavcic <
> > vladisavj@gmail.com>
> > > wrote:
> > >
> > > > Hi Sergey,
> > > >
> > > > can you please provide more information?
> > > > Have you changed the example (if so, can you provide the changes you
> > > made?)
> > > > Is the example executed normally (without node failures)?
> > > >
> > > > In the example, semaphore is created in non-failover safe mode,
> > > > which means it is not safe to use it once it is broken (something
> like
> > > > CyclicBarrier in java.util.concurrent),
> > > > and the semaphore is preserved in spite of the first node failing (if
> > the
> > > > backups are configured),
> > > > so if the first node failed, then (broken) semaphore with the same
> name
> > > > should still be in the cache,
> > > > and this is expected behavior.
> > > >
> > > > If this is not the case (test was executed normally) then please
> > submit a
> > > > ticket describing more your setup,
> > > > how many nodes, how many backups configured, etc..
> > > >
> > > > Thanks!
> > > > Vladisav
> > > >
> > > > On Tue, Nov 8, 2016 at 10:37 AM, Sergey Chugunov <
> > > > sergey.chugunov@gmail.com>
> > > > wrote:
> > > >
> > > > >  Hello folks,
> > > > >
> > > > > I found a reason why *IgniteSemaphoreExample* hangs when started
> > twice
> > > > > without restarting a cluster; and it doesn't seem minor to me
> > anymore.
> > > > >
> > > > > From here I'm going to refer to example's code so please have it
> > > opened.
> > > > >
> > > > > So, when the first instance of node running example code finishes
> and
> > > > > leaves the cluster, synchronization semaphore named
> > > > > "IgniteSemaphoreExample" goes to broken state on all other cluster
> > > nodes.
> > > > > If I restart example without restarting all nodes of the cluster,
> > final
> > > > > *acquire *call on the semaphore on client side hangs because all
> > other
> > > > > nodes treat it as broken and don't increase permits with their
> > *release
> > > > > *calls
> > > > > on it.
> > > > >
> > > > > There is an interesting comment inside its *tryReleaseShared*
> > > > > implementation
> > > > > (BTW it is implemented in *GridCacheSemaphoreImpl*):
> > > > >
> > > > > "// If broken, return immediately, exception will be thrown anyway.
> > > > >  if (broken)
> > > > >    return true;"
> > > > >
> > > > > It seems that no exceptions are thrown neither on client side
> calling
> > > > > *acquire
> > > > > *or on server side calling *release *methods on a broken semaphore.
> > > > >
> > > > > Does anybody know why it behaves in that way? Is it expected
> behavior
> > > at
> > > > > all and if yes where is it documented?
> > > > >
> > > > > Thanks,
> > > > > Sergey Chugunov.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > С уважением,
> > > Сергей Чугунов.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message