ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Vinogradov <avinogra...@gridgain.com>
Subject Re: Add emergency node closing handler to public Ignite API
Date Wed, 15 Nov 2017 11:39:08 GMT
According to [1]

Reasons are:
- IgniteOutOfMemoryException
- Persistence errors
- ExchangeWorker exits with error

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-7%3A+Ignite+internal+problems+detection

On Wed, Nov 15, 2017 at 2:24 PM, Vladimir Ozerov <vozerov@gridgain.com>
wrote:

> I am not quite I understand how tasks are split. How can we discuss
> graceful shutdown without discussing the reasons of this shutdown? What
> leads to it?
>
> On Wed, Nov 15, 2017 at 2:10 PM, Anton Vinogradov <
> avinogradov@gridgain.com>
> wrote:
>
> > Vova,
> >
> > Currently we have a lot IEPs to improve grid monitoring and behavior.
> >
> > Let's split tasks to:
> >
> > 1) Graceful shutdown.
> > In this case we'd like to provide user ability to do something,
> > LifecycleBean is what we looking for, thanks for tips!
> > But, we have to keep shutdown reason somewhere.
> > In case you know where it already kept , please let us know.
> >
> > 2) OOM or any other reason cause node crash.
> > In this case some watchdog (like [1] or [2]) should monitor node alive
> >
> > 3) GC and deadlock(java and tx) issues
> > Should be monitored by special thread [3] or published by metrics [4]
> >
> > 4) Throughput, latency and space issues
> > Special metrics should be developed according to [5]
> >
> > Andrey asking about case #1 (graceful shutdown), lets discuss only this
> > case.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-6587
> > [2] https://wrapper.tanukisoftware.com/doc/english/download.jsp
> > [3] https://issues.apache.org/jira/browse/IGNITE-6171
> > [4]
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 7%3A+Ignite+internal+problems+detection
> > [5]
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 6%3A+Metrics+improvements
> >
> >
> > On Wed, Nov 15, 2017 at 1:34 PM, Vladimir Ozerov <vozerov@gridgain.com>
> > wrote:
> >
> > > AFAIK the idea was not only to shutdown the node, but also to give user
> > > (e.g. administrator) ability to observe the problem from the outside,
> > e.g.
> > > through JMX. E.g. if we detect Java-level deadlock, it doesn't mean
> that
> > > the only possible solution is node shutdown. In addition it could be
> > no-op,
> > > e.g. to give user chance to collect additional system info, or simply
> > > because this particular deadlock is resolvable (e.g.
> > > Lock.lockInterruptibly()). So as we need to expose health info through
> > JMX
> > > anyway, we could also give user programmatic access to it as well.
> > > Alternatively, we can expose this info through JMX only and ask user to
> > get
> > > instance of that bean manually.
> > >
> > > On Wed, Nov 15, 2017 at 1:19 PM, Anton Vinogradov <
> > > avinogradov@gridgain.com>
> > > wrote:
> > >
> > > > Vova,
> > > >
> > > > Could you point to metric you're talking about?
> > > >
> > > > On Wed, Nov 15, 2017 at 1:06 PM, Andrey Kuznetsov <stkuzma@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Vladimir,
> > > > >
> > > > > Could you please refine, what are local metrics? Should I extend
> > Ignite
> > > > > interface by adding something similar to dataRegionMetrics() or
> there
> > > is
> > > > > some universal mechanism to handle metrics?
> > > > >
> > > > > 2017-11-15 8:30 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> > > > > >
> > > > > > This information should be available through local metrics,
so
> that
> > > it
> > > > is
> > > > > > accessible from Ignite instance.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message