ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yakov Zhdanov <yzhda...@gridgain.com>
Subject Re: Partitioned cache and node failures
Date Thu, 14 May 2015 11:15:04 GMT
Ongen, can you share your test via Jira issue?

It would be very helpful if you could take logs and threaddumps from all
the nodes in topology and put them all together to a Jira issue.

Thanks!

--
Yakov Zhdanov, Director R&D
*GridGain Systems*
www.gridgain.com

2015-05-12 22:33 GMT+03:00 Ognen Duzlevski <ognen.duzlevski@gmail.com>:

> Dmitriy,
>
> It is not a firewall issue. However, the hardware crash has something to do
> with it probably.
>
> In that direction - can one expect a crash of one node (out of 5) housing a
> few partitioned caches to affect the availability of all the caches? The
> strange thing is visor was able to show them all but acquiring them through
> a Scala app using getOrCreateCache() just hung. I ended up "rigging" visor
> with a capability to dump cache -scan results to a file - I was able to
> salvage all my data and then I restarted the cluster.
>
> Certainly pretty clumsy ;)
>
> Ognen
>
> On Tue, May 12, 2015 at 1:28 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
> wrote:
>
> > Ognen,
> >
> > It sounds to me like this is the same issue you had recently with the
> cloud
> > node crashing due to hardware failure. If this is the case, then it
> sounds
> > like a firewall issue for me. Are you sure there is no firewall setup
> > between nodes and they are all deployed in the same availability zone?
> >
> > D.
> >
> > On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <yzhdanov@apache.org>
> > wrote:
> >
> > > Can you please file a ticket and share your sample applicaiton with us?
> > >
> > > If it is not possible, then attach verbose logs from all the nodes and
> > > threaddumps from all the nodes after issue gets reproduced.
> > >
> > > Thanks!
> > >
> > > --Yakov
> > >
> > > 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <ognen.duzlevski@gmail.com
> >:
> > >
> > > > In a partitioned cache (or set of partitioned caches) - does a single
> > > node
> > > > failure mean all of the cache(s) become unavailable?
> > > >
> > > > I am seeing a situation where I cannot access any of the caches
> (using
> > > > getOrCreateCache) - all my code just "hangs".
> > > >
> > > > The interesting thing is that visor can see all the caches and their
> > > > contents.
> > > >
> > > > What is so special about visor?
> > > >
> > > > I would appreciate if someone would try and answer any of these (I
> can
> > > > provide more info). as I am evaluating ignite for our use in a data
> > > > science/analytics setup :-)
> > > >
> > > > Thanks!
> > > > Ognen
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message