ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yakov Zhdanov <yzhda...@gridgain.com>
Subject Re: ignite 1.4 status
Date Sat, 19 Sep 2015 15:46:35 GMT
Alex & Val, I reviewed your changes and they seem good to me. Good catches!
However, TC state is not acceptable. I am afraid you touched some
fundamentals of TCP discovery:

06:06:31]W: [org.apache.ignite:ignite-core]
[06:06:31,976][ERROR][tcp-disco-msg-worker-#16%atomic.GridCacheValueConsistencyAtomicSelfTest2][TcpDiscoverySpi]
Runtime error caught during grid runnable execution: IgniteSpiThread
[name=tcp-disco-msg-worker-#16%atomic.GridCacheValueConsistencyAtomicSelfTest2]
[06:06:31]W: [org.apache.ignite:ignite-core] java.lang.AssertionError:
Topology version has not been updated: [ring=TcpDiscoveryNodesRing
[locNode=TcpDiscoveryNode [id=20776610-8b3e-4bd4-92f6-8da1f618d002,
addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=3,
intOrder=3, lastExchangeTime=1442631991969, loc=true,
ver=1.4.0#19700101-sha1:00000000, isClient=false], nodes=[TcpDiscoveryNode
[id=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, addrs=[127.0.0.1], sockAddrs=[/
127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1442631951804, loc=false,
ver=1.4.0#19700101-sha1:00000000, isClient=false], TcpDiscoveryNode
[id=10002b17-a015-4a12-aad1-7829b80bb001, addrs=[127.0.0.1], sockAddrs=[/
127.0.0.1:47501], discPort=47501, order=2, intOrder=2,
lastExchangeTime=1442631951804, loc=false,
ver=1.4.0#19700101-sha1:00000000, isClient=false], TcpDiscoveryNode
[id=20776610-8b3e-4bd4-92f6-8da1f618d002, addrs=[127.0.0.1], sockAddrs=[/
127.0.0.1:47502], discPort=47502, order=3, intOrder=3,
lastExchangeTime=1442631991969, loc=true, ver=1.4.0#19700101-sha1:00000000,
isClient=false], TcpDiscoveryNode [id=30a3b148-11f5-4b21-ab65-8912748b3003,
addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47503], discPort=47503, order=4,
intOrder=4, lastExchangeTime=1442631991907, loc=false,
ver=1.4.0#19700101-sha1:00000000, isClient=false]], topVer=5, nodeOrder=4],
msg=TcpDiscoveryNodeAddFinishedMessage
[nodeId=30a3b148-11f5-4b21-ab65-8912748b3003,
super=TcpDiscoveryAbstractMessage
[sndNodeId=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000,
id=bb9a093ef41-00b46a2e-f5fc-4dfc-bfb0-9042f2da0000,
verifierNodeId=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, topVer=4,
pendingIdx=0, isClient=false]], lastMsg=TcpDiscoveryNodeLeftMessage
[super=TcpDiscoveryAbstractMessage
[sndNodeId=10002b17-a015-4a12-aad1-7829b80bb001,
id=b005ec3ef41-30a3b148-11f5-4b21-ab65-8912748b3003,
verifierNodeId=00b46a2e-f5fc-4dfc-bfb0-9042f2da0000, topVer=5,
pendingIdx=0, isClient=false]], spiState=CONNECTED]
[06:06:31]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:3371)
[06:06:31]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:1994)
[06:06:31]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5268)
[06:06:31]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)


[06:11:40]W: [org.apache.ignite:ignite-core]
[06:11:40,190][ERROR][tcp-disco-msg-worker-#1490%replicated.GridCacheSyncReplicatedPreloadSelfTest7][TcpDiscoverySpi]
Runtime error caught during grid runnable execution: IgniteSpiThread
[name=tcp-disco-msg-worker-#1490%replicated.GridCacheSyncReplicatedPreloadSelfTest7]
[06:11:40]W: [org.apache.ignite:ignite-core] java.lang.AssertionError
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:4180)
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2015)
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5268)
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[06:11:40]W: [org.apache.ignite:ignite-core] Exception in thread
"tcp-disco-msg-worker-#1490%replicated.GridCacheSyncReplicatedPreloadSelfTest7"
java.lang.AssertionError
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:4180)
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2015)
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5268)
[06:11:40]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)


--
Yakov Zhdanov, Director R&D
*GridGain Systems*
www.gridgain.com

2015-09-19 4:07 GMT+03:00 Alexey Goncharuk <alexey.goncharuk@gmail.com>:

> Yakov,
>
> Valentin and I debugged the issue with ignite-1171 and I think we got to
> the bottom of it. First of all, pending messages were not reset to the
> correct collection on joining node which resulted in skipped custom event
> notifications. Second, the check that you have added to avoid discarding of
> custom message was checking wrong variable and wrong type :) After we fixed
> those two issues, the test seem to pass. Please review my changes again.
>
> --AG
>
> 2015-09-18 14:10 GMT-07:00 Yakov Zhdanov <yzhdanov@apache.org>:
>
> > Igniters,
> >
> > While working on ignite-1171 we discovered couple more issues in
> discovery
> > that might have threaten custom events processing under some
> circumstances
> > (we have continuous processes based on this logic, for example).
> >
> > Alexey Goncharuk has picked this up.
> >
> > Another critical issue discovered today -
> > https://issues.apache.org/jira/browse/IGNITE-1516 - performance drop in
> > offheap query benchmark. Semyon will be fixing it.
> >
> > https://issues.apache.org/jira/browse/IGNITE-973 - Sergi has come to
> > conclusion that race still present in cache offheap swap logic. Currently
> > this is assigned to Semyon, too.
> >
> > We need to postpone release till very beginning of next week.
> >
> > --Yakov
> >
> > 2015-09-18 12:01 GMT+03:00 Yakov Zhdanov <yzhdanov@apache.org>:
> >
> > > Alex, I think that your approach with delaying custom message will
> work.
> > > As far as coordinator crash protection, we guarantee delivery of
> certain
> > > messages types (including custom message). This logic was implemented
> > long
> > > ago and seems to work. So, the message just gets resent.
> > >
> > > Semyon, can you please take  a look at Alex's changes?
> > >
> > > --Yakov
> > >
> > > 2015-09-18 3:24 GMT+03:00 Alexey Goncharuk <alexey.goncharuk@gmail.com
> >:
> > >
> > >> Yakov,
> > >>
> > >> The approach with collecting discovery data on NodeAddFinished message
> > >> does
> > >> not work because this messages get relayed to clients before the
> message
> > >> passes the whole ring. If we make it to pass the ring and relay it to
> > >> clients on the second round, we get the same race as I was fixing.
> > >>
> > >> I think the correct approach here is to delay custom event messages
> when
> > >> node join is in progress - basically do not allow custom messages
> > between
> > >> NodeAddedMessage and NodeAddFinished message. I implemented a very
> > simple
> > >> fix in ignite-1171, however I need you someone else with good
> expertise
> > in
> > >> discovery protocol to take a look at my changes because I am sure I
> > missed
> > >> something - e.g. I am not sure how delayed messages should be handled
> in
> > >> case when coordinator node crashes.
> > >>
> > >> 2015-09-17 8:52 GMT-07:00 Yakov Zhdanov <yzhdanov@gridgain.com>:
> > >>
> > >> > Alex, I think it makes sense to continue investigating this. We can
> > >> discuss
> > >> > whether we include or skip the fix once fix is ready.
> > >> >
> > >> > As far as other tickets:
> > >> >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> > >> >
> > >> > IGNITE-1171 Getting affinity for topology version earlier than
> > affinity
> > >> is
> > >> > calculated - is on Alex Goncharuk.
> > >> > IGNITE-973 Failed to get value for key: 13791. at
> > >> >
> > >> >
> > >>
> >
> o.a.i.i.processors.query.h2.opt.GridH2AbstractKeyValueRow.getValue(GridH2AbstractKeyValueRow.java:223)
> > >> > - assigned to Sergi. There seems to be a problem in offheap indexing
> > >> which
> > >> > can be reproduced from time to time. This is an old issue and I
> think
> > >> can
> > >> > be postponed if does not fit.
> > >> >
> > >> > +1 IGFS issue
> > >> > and rest ver.x issues
> > >> >
> > >> > I hope IGNITE-1171 will be fixed today so picture become much
> cleaner.
> > >> >
> > >> > --
> > >> > Yakov Zhdanov, Director R&D
> > >> > *GridGain Systems*
> > >> > www.gridgain.com
> > >> >
> > >> > 2015-09-17 0:59 GMT+03:00 Alexey Goncharuk <
> > alexey.goncharuk@gmail.com
> > >> >:
> > >> >
> > >> > > Yakov, Igniters,
> > >> > >
> > >> > > I have found at least one issue related to ignite-1171 hang,
it is
> > >> caused
> > >> > > by a race between discovery custom message and
> > collectDiscoveryData()
> > >> > call
> > >> > > (updated the ticket). I remember we wanted to call
> > >> collectDiscoveryData()
> > >> > > during the NodeAddFinishedMessage processing, however it was
not
> > >> > > implemented - do we think that this is a correct change and do
we
> > >> want it
> > >> > > to be fixed in 1.4? Discovery changes are quite sensitive and
I
> > would
> > >> > > prefer them to be tested thoroughly.
> > >> > >
> > >> > > 2015-09-16 9:09 GMT-07:00 Yakov Zhdanov <yzhdanov@apache.org>:
> > >> > >
> > >> > > > Guys,
> > >> > > >
> > >> > > > I want to update release status.
> > >> > > >
> > >> > > > Testing has revealed some cache issues which should be fixed
> with
> > >> the
> > >> > > > release. Moreover, it turned out that these issues block
vert.x
> > >> > release.
> > >> > > > So, if we fix them we can consider including vert.x into
1.4
> > >> release.
> > >> > > Which
> > >> > > > is good I think.
> > >> > > >
> > >> > > > I think that Alex Goncharuk is the best person who can look
into
> > >> vert.x
> > >> > > > issues. Alex, please first of all pay attention to IGNITE-1171
-
> > >> > Getting
> > >> > > > affinity for topology version earlier than affinity is
> calculated
> > -
> > >> > Test
> > >> > > > reproducing the issue has been added to ignite1.4. Alex
please
> let
> > >> us
> > >> > > know
> > >> > > > if this can be fixed.
> > >> > > >
> > >> > > > These issues are on Semyon Boikov:
> > >> > > >
> > >> > > > IGNITE-973 Failed to get value for key: 13791. at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> o.a.i.i.processors.query.h2.opt.GridH2AbstractKeyValueRow.getValue(GridH2AbstractKeyValueRow.java:223)
> > >> > > > - We need more time to finish with this. Some race in swap
is
> > still
> > >> > > there.
> > >> > > > IGNITE-1452 OptimizedMarshaller.unmarshal hangs in
> > >> > > > IgniteCacheQueryNodeRestartSelfTest2 - Need to check TC
and
> merge.
> > >> > > >
> > >> > > > Rest of tickets are vert.x related. Here is the link -
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> > >> > > >
> > >> > > > Andrey Gura, please provide as much information as you can
for
> the
> > >> rest
> > >> > > of
> > >> > > > vert.x tickets.
> > >> > > >
> > >> > > > Thanks!
> > >> > > >
> > >> > > > --Yakov
> > >> > > >
> > >> > > > 2015-09-15 19:12 GMT+03:00 Yakov Zhdanov <yzhdanov@apache.org>:
> > >> > > >
> > >> > > > > Raul, how is your status with the streamer? I think
there is
> no
> > >> > reason
> > >> > > > for
> > >> > > > > rush. We can put it to 1.5. Please let me know what
you think.
> > >> > > > >
> > >> > > > > As far as release status here are the open tickets
-
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20assignee%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> > >> > > > >
> > >> > > > > https://issues.apache.org/jira/browse/IGNITE-1239 -
Alex
> > >> Goncharuk,
> > >> > > can
> > >> > > > > you please let us know if this will be finished today?
> > >> > > > > https://issues.apache.org/jira/browse/IGNITE-1490 -
Ilya
> > Suntsov
> > >> > works
> > >> > > > on
> > >> > > > > reproducing this. I suspect we may have problems with
near
> cache
> > >> > > > evictions.
> > >> > > > > Can Val or Alex proceed with this after Ilya finishes
test
> run?
> > >> Ilya,
> > >> > > > > please respond in ticket upon your results.
> > >> > > > >
> > >> > > > > Thanks!
> > >> > > > >
> > >> > > > > --Yakov
> > >> > > > >
> > >> > > > > 2015-09-15 11:15 GMT+03:00 Raul Kripalani <raul@evosent.com>:
> > >> > > > >
> > >> > > > >> Hi guys,
> > >> > > > >>
> > >> > > > >> The MQTT streamer I'm working on will be ready
this week.
> > >> Hopefully
> > >> > as
> > >> > > > >> soon
> > >> > > > >> as today or tomorrow.
> > >> > > > >>
> > >> > > > >> It's not important for the 1.4 release, but it
seems like
> it'll
> > >> make
> > >> > > the
> > >> > > > >> timeline to get potentially merged.
> > >> > > > >>
> > >> > > > >> Regards,
> > >> > > > >> Raúl.
> > >> > > > >> On 15 Sep 2015 00:05, "Yakov Zhdanov" <yzhdanov@apache.org>
> > >> wrote:
> > >> > > > >>
> > >> > > > >> > Guys,
> > >> > > > >> >
> > >> > > > >> > Current status is the following:
> > >> > > > >> >
> > >> > > > >> > 1. Sam needs to merge his fixes after TC is
finished.
> > >> > > > >> > 2. Some minor changes pending from Denis +
release notes
> fix
> > >> > pointed
> > >> > > > by
> > >> > > > >> > Dmitry.
> > >> > > > >> > 3. Several suites are still red on TC
> > >> > > > >> >
> > >> > > > >> > I have moved plenty of tickets to ignite-1.5.
Here is the
> > link
> > >> to
> > >> > > > >> currently
> > >> > > > >> > open tickets that I want everyone (esp. assignees)
to look
> > >> through
> > >> > > and
> > >> > > > >> tell
> > >> > > > >> > me whether ticket can be moved or should be
fixed -
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%20ignite-1.4%20AND%20status%20!%3D%20closed%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC
> > >> > > > >> >
> > >> > > > >> > Alex Goncharuk has 5 tickets.
> > >> > > > >> > Semyon Boikov has 5 tickets.
> > >> > > > >> > Valentin has 4
> > >> > > > >> > Sergi has 4
> > >> > > > >> > Vladimir has 3
> > >> > > > >> > Ivan V. has 3
> > >> > > > >> >
> > >> > > > >> > Guys, please look your tickets through and
let us know your
> > >> > > decision.
> > >> > > > >> >
> > >> > > > >> > --Yakov
> > >> > > > >> >
> > >> > > > >> > 2015-09-14 21:04 GMT+03:00 Dmitriy Setrakyan
<
> > >> > dsetrakyan@apache.org
> > >> > > >:
> > >> > > > >> >
> > >> > > > >> > > Yakov,
> > >> > > > >> > >
> > >> > > > >> > > I know you were managing the 1.4 release.
Can you please
> > >> provide
> > >> > > an
> > >> > > > >> > update
> > >> > > > >> > > of what goes into the release at this
point and what is
> the
> > >> > > overall
> > >> > > > >> plan?
> > >> > > > >> > >
> > >> > > > >> > > Thanks,
> > >> > > > >> > > D.
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message