ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Kasnacheev <ilya.kasnach...@gmail.com>
Subject Re: Failed to wait for partition map exchange on cluster activation
Date Wed, 16 Jan 2019 12:13:04 GMT
Hello!

Sorry, wrong thread:)

It is hard to say what happens here exactly. Can you collect several thread
dumps during this prolonged activation, share them with us?

Do you have e.g. services? I was told that services would start during
activation.

Regards,
-- 
Ilya Kasnacheev


ср, 16 янв. 2019 г. в 15:10, Ilya Kasnacheev <ilya.kasnacheev@gmail.com>:

> Hello!
>
> I can see multiple "Failed to process selector key" errors in your log.
> Are you sure that your nodes can communicate via network freely and without
> delay?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 15 янв. 2019 г. в 20:12, Andrey Davydov <andrey.davydov@gmail.com>:
>
>> Hello,
>>
>> You can find full log there:
>> https://drive.google.com/file/d/1FwCjsXMw5LQJnKO0x5GNJ2w9gVsDbXlc/view?usp=sharing
>>
>> I can rerun tests with additional logging settings if needed
>>
>> Andrey.
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 15, 2019 at 6:23 PM Ilya Kasnacheev <
>> ilya.kasnacheev@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Can you please upload the full verbose log somewhere?
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> ср, 9 янв. 2019 г. в 20:43, Andrey Davydov <andrey.davydov@gmail.com>:
>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I found in test logs of my project that Ignite warns about failed
>>>> partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
>>>> in the same JVM8 on Win10, using localhost networking.
>>>>
>>>>
>>>>
>>>> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Affinity changes applied in 10 ms.
>>>>
>>>> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Affinity changes applied in 10 ms.
>>>>
>>>> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Full map updating for 5 groups performed in 4 ms.
>>>>
>>>> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Full map updating for 5 groups performed in 5 ms.
>>>>
>>>> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> err=null]
>>>>
>>>> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> err=null]
>>>>
>>>> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
>>>> walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
>>>>
>>>> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
>>>> walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
>>>>
>>>> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
>>>> walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
>>>>
>>>> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
>>>> pagesWrite=84ms, fsync=1992ms, total=3179ms]
>>>>
>>>> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
>>>> pagesWrite=79ms, fsync=2066ms, total=3358ms]
>>>>
>>>> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
>>>> pagesWrite=79ms, fsync=2203ms, total=3544ms]
>>>>
>>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>>>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
>>>> might be the cause:
>>>>
>>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>>>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>>
>>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
>>>>
>>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>>>> - First 10 pending exchange futures [total=0]
>>>>
>>>> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>>>> (total: 1):
>>>>
>>>> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - >>>
>>>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>>>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>>> lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
>>>> isClient=false], done=false]
>>>>
>>>> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>>>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>>
>>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>>>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
>>>> might be the cause:
>>>>
>>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>>>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>>
>>>> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last exchange future:
>>>>
>>>> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>>>> - First 10 pending exchange futures [total=0]
>>>>
>>>> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>>>> (total: 1):
>>>>
>>>> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - >>>
>>>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>>>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>>> lastExchangeTime=1547054124462, loc=false,
>>>> ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
>>>>
>>>> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>>>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>>
>>>>
>>>>
>>>> After that, everything works fine, but it cause every test to take
>>>> more 10 seconds to run. And I am afraid to use it in prod before I
>>>> understand problem.
>>>>
>>>>
>>>>
>>>> Andrey.
>>>>
>>>>
>>>>
>>>

Mime
View raw message