ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yakov Zhdanov <yzhda...@apache.org>
Subject Re: Async cache groups rebalance not started with rebalanceOrder ZERO
Date Wed, 18 Jul 2018 12:13:02 GMT
Maxim, I checked and it seems that send retry count is used only in cache
IO manager and the usage is semantically very far from what I suggest.
Resend count limits the attempts count, while I meant successfull send but
possible problems on supplier side.

--Yakov

2018-07-17 19:01 GMT+03:00 Maxim Muzafarov <maxmuzaf@gmail.com>:

> Yakov,
>
> But we already have DFLT_SEND_RETRY_CNT and DFLT_SEND_RETRY_DELAY for
> configuring our CommunicationSPI behavior. What if user configure this
> parameters his own way and he will see a lot of WARN messages in log which
> have no sense?
>
> May be we use GridCachePartitionExchangeManager#forceRebalance (or may
> be forceReassign) if we fail rebalance all that retries. What do you think?
>
>
>
> пн, 16 июл. 2018 г. в 21:12, Yakov Zhdanov <yzhdanov@gridgain.com>:
>
> > Maxim, I looked at the code you provided. I think we need to add some
> > timeout validation and output warning to logs on demander side in case
> > there is no supply message within 30 secs and repeat demanding process.
> > This should apply to any demand message throughout the rebalancing
> process
> > not only the 1st one.
> >
> > You can use the following message
> >
> > Failed to wait for supply message from node within 30 secs [cache=C,
> > partId=XX]
> >
> > Alex Goncharuk do you have comments here?
> >
> > Yakov Zhdanov
> > www.gridgain.com
> >
> > 2018-07-14 19:45 GMT+03:00 Maxim Muzafarov <maxmuzaf@gmail.com>:
> >
> > > Yakov,
> > >
> > > Yes, you're right. Whole rebalancing progress will be stopped.
> > >
> > > Actually, rebalancing order doesn't matter you right it too. Javadoc
> just
> > > says the idea how rebalance should work for caches but in fact it don't
> > > work as described. Personally, I'd prefer to start rebalance of each
> > cache
> > > group in async way independently.
> > >
> > > Please, look at my reproducer [1].
> > >
> > > Scenario:
> > > Cluster with two REPLICATEDED caches.
> > > Start new node.
> > > First rebalance cache group is failed to start (e.g. network issues) -
> > it's
> > > OK.
> > > Second rebalance cache group will neber be started - whole futher
> > progress
> > > stucks (I think rebalance here should be started!).
> > >
> > >
> > > [1]
> > > https://github.com/Mmuzaf/ignite/blob/rebalance-cancel/
> > > modules/core/src/test/java/org/apache/ignite/internal/
> > > processors/cache/distributed/rebalancing/
> GridCacheRebalancingCancelSelf
> > > Test.java
> > >
> > > пт, 13 июл. 2018 г. в 17:46, Yakov Zhdanov <yzhdanov@apache.org>:
> > >
> > > > Maxim, I do not understand the problem. Imagine I do not have any
> > > ordering
> > > > but rebalancing of some cache fails to start - so in my understanding
> > > > overall rebalancing progress becomes blocked. Is that true?
> > > >
> > > > Can you pleaes provide reproducer for your problem?
> > > >
> > > > --Yakov
> > > >
> > > > 2018-07-09 16:42 GMT+03:00 Maxim Muzafarov <maxmuzaf@gmail.com>:
> > > >
> > > > > Hello Igniters,
> > > > >
> > > > > Each cache group has “rebalance order” property. As javadoc for
> > > > > getRebalanceOrder() says: “Note that cache with order {@code 0}
> does
> > > not
> > > > > participate in ordering. This means that cache with rebalance order
> > > > {@code
> > > > > 0} will never wait for any other caches. All caches with order
> {@code
> > > 0}
> > > > > will be rebalanced right away concurrently with each other and
> > ordered
> > > > > rebalance processes. If not set, cache order is 0, i.e. rebalancing
> > is
> > > > not
> > > > > ordered.”
> > > > >
> > > > > In fact GridCachePartitionExchangeManager always build the chain
> of
> > > > > rebalancing cache groups to start (even for cache order ZERO):
> > > > >
> > > > > ignite-sys-cache -> cacheR -> cacheR3 -> cacheR2 -> cacheR5
->
> > cacheR1.
> > > > >
> > > > > If one of these groups will fail to start further groups will never
> > be
> > > > run.
> > > > >
> > > > > * Question 1*: Should we fix javadoc description or create a bug
> for
> > > > fixing
> > > > > such rebalance behavior?
> > > > >
> > > > > [1]
> > > > > https://github.com/apache/ignite/blob/master/modules/
> > > > > core/src/main/java/org/apache/ignite/internal/processors/cache/
> > > > > GridCachePartitionExchangeManager.java#L2630
> > > > >
> > > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> >
> --
> --
> Maxim Muzafarov
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message