ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arseny Kovalchuk <arseny.kovalc...@synesis.ru>
Subject Re: Partition eviction failed, this can cause grid hang. (Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is corrupted))
Date Tue, 26 Dec 2017 12:39:12 GMT
Hi Andrey.

Thanks for information. Issues look like related to those we've got.
Looking forward for fixes.

Regards.

​
Arseny Kovalchuk

Senior Software Engineer at Synesis
skype: arseny.kovalchuk
mobile: +375 (29) 666-16-16
​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​

On 26 December 2017 at 14:49, Andrey Mashenkov <andrey.mashenkov@gmail.com>
wrote:

> Hi Arseny,
>
> Seems this is already fixed [1] in master, but seems there is another
> issue [2] and we are in the middle of fixing it.
> We've found there were some unsafe memory changing operations without lock.
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6423
> [2] https://issues.apache.org/jira/browse/IGNITE-7278
>
> On Tue, Dec 26, 2017 at 1:02 PM, Arseny Kovalchuk <
> arseny.kovalchuk@synesis.ru> wrote:
>
>> Hi guys.
>>
>> Another issue when using Ignite 2.3 with native persistence enabled. See
>> details below.
>>
>> We deploy Ignite along with our services in Kubernetes (v 1.8) on
>> premises. Ignite cluster is a StatefulSet of 5 Pods (5 instances) of Ignite
>> version 2.3. Each Pod mounts PersistentVolume backed by CEPH RBD.
>>
>> We put about 230 events/second into Ignite, 70% of events are ~200KB in
>> size and 30% are 5000KB. Smaller events have indexed fields and we query
>> them via SQL.
>>
>> The cluster is activated from a client node which also streams events
>> into Ignite from Kafka. We use custom implementation of streamer which uses
>> cache.putAll() API.
>>
>> We started cluster from scratch without any persistent data. After a
>> while we got corrupted data with the error message.
>>
>> [2017-12-26 07:44:14,251] ERROR [sys-#127%ignite-instance-2%]
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader:
>> - Partition eviction failed, this can cause grid hang.
>> class org.apache.ignite.IgniteException: Runtime failure on search row:
>> Row@5b1479d6[ key: 171:1513946618964:3008806055072854, val:
>> ru.synesis.kipod.event.KipodEvent [idHash=510912646, hash=-387621419,
>> face_last_name=null, face_list_id=null, channel=171, source=,
>> face_similarity=null, license_plate_number=null, descriptors=null,
>> cacheName=kipod_events, cacheKey=171:1513946618964:3008806055072854,
>> stream=171, alarm=false, processed_at=0, face_id=null, id=3008806055072854,
>> persistent=false, face_first_name=null, license_plate_first_name=null,
>> face_full_name=null, level=0, module=Kpx.Synesis.Outdoor,
>> end_time=1513946624379, params=null, commented_at=0, tags=[vehicle, 0,
>> human, 0, truck, 0, start_time=1513946618964, processed=false,
>> kafka_offset=111259, license_plate_last_name=null, armed=false,
>> license_plate_country=null, topic=MovingObject, comment=,
>> expiration=1514033024000, original_id=null, license_plate_lists=null], ver:
>> GridCacheVersion [topVer=125430590, order=1513955001926, nodeOrder=3] ][
>> 3008806055072854, MovingObject, Kpx.Synesis.Outdoor, 0, , 1513946618964,
>> 1513946624379, 171, 171, FALSE, FALSE, , FALSE, FALSE, 0, 0, 111259,
>> 1514033024000, (vehicle, 0, human, 0, truck, 0), null, null, null, null,
>> null, null, null, null, null, null, null, null ]
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.doRemove(BPlusTree.java:1787)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.remove(BPlusTree.java:1578)
>> at org.apache.ignite.internal.processors.query.h2.database.H2Tr
>> eeIndex.remove(H2TreeIndex.java:216)
>> at org.apache.ignite.internal.processors.query.h2.opt.GridH2Tab
>> le.doUpdate(GridH2Table.java:496)
>> at org.apache.ignite.internal.processors.query.h2.opt.GridH2Tab
>> le.update(GridH2Table.java:423)
>> at org.apache.ignite.internal.processors.query.h2.IgniteH2Index
>> ing.remove(IgniteH2Indexing.java:580)
>> at org.apache.ignite.internal.processors.query.GridQueryProcess
>> or.remove(GridQueryProcessor.java:2334)
>> at org.apache.ignite.internal.processors.cache.query.GridCacheQ
>> ueryManager.remove(GridCacheQueryManager.java:461)
>> at org.apache.ignite.internal.processors.cache.IgniteCacheOffhe
>> apManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOff
>> heapManagerImpl.java:1453)
>> at org.apache.ignite.internal.processors.cache.IgniteCacheOffhe
>> apManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapMa
>> nagerImpl.java:1416)
>> at org.apache.ignite.internal.processors.cache.persistence.Grid
>> CacheOffheapManager$GridCacheDataStore.remove(GridCacheOffhe
>> apManager.java:1271)
>> at org.apache.ignite.internal.processors.cache.IgniteCacheOffhe
>> apManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374)
>> at org.apache.ignite.internal.processors.cache.GridCacheMapEntr
>> y.removeValue(GridCacheMapEntry.java:3233)
>> at org.apache.ignite.internal.processors.cache.distributed.dht.
>> GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588)
>> at org.apache.ignite.internal.processors.cache.distributed.dht.
>> GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:951)
>> at org.apache.ignite.internal.processors.cache.distributed.dht.
>> GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:809)
>> at org.apache.ignite.internal.processors.cache.distributed.dht.
>> preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
>> at org.apache.ignite.internal.processors.cache.distributed.dht.
>> preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
>> at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader
>> (IgniteUtils.java:6631)
>> at org.apache.ignite.internal.processors.closure.GridClosurePro
>> cessor$2.body(GridClosureProcessor.java:967)
>> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWo
>> rker.java:110)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.IllegalStateException: Failed to get page IO
>> instance (page content is corrupted)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .io.IOVersions.forVersion(IOVersions.java:83)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .io.IOVersions.forPage(IOVersions.java:95)
>> at org.apache.ignite.internal.processors.cache.persistence.Cach
>> eDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148)
>> at org.apache.ignite.internal.processors.cache.persistence.Cach
>> eDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102)
>> at org.apache.ignite.internal.processors.query.h2.database.H2Ro
>> wFactory.getRow(H2RowFactory.java:62)
>> at org.apache.ignite.internal.processors.query.h2.database.io.
>> H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:126)
>> at org.apache.ignite.internal.processors.query.h2.database.io.
>> H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:36)
>> at org.apache.ignite.internal.processors.query.h2.database.H2Tr
>> ee.getRow(H2Tree.java:123)
>> at org.apache.ignite.internal.processors.query.h2.database.H2Tr
>> ee.getRow(H2Tree.java:40)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.getRow(BPlusTree.java:4372)
>> at org.apache.ignite.internal.processors.query.h2.database.H2Tr
>> ee.compare(H2Tree.java:200)
>> at org.apache.ignite.internal.processors.query.h2.database.H2Tr
>> ee.compare(H2Tree.java:40)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.compare(BPlusTree.java:4359)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.findInsertionPoint(BPlusTree.java:4279)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.access$1500(BPlusTree.java:81)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree$Search.run0(BPlusTree.java:261)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree$GetPageHandler.run(BPlusTree.java:4697)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree$GetPageHandler.run(BPlusTree.java:4682)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .util.PageHandler.readPage(PageHandler.java:158)
>> at org.apache.ignite.internal.processors.cache.persistence.Data
>> Structure.read(DataStructure.java:319)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.removeDown(BPlusTree.java:1823)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.removeDown(BPlusTree.java:1842)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.removeDown(BPlusTree.java:1842)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.removeDown(BPlusTree.java:1842)
>> at org.apache.ignite.internal.processors.cache.persistence.tree
>> .BPlusTree.doRemove(BPlusTree.java:1752)
>> ... 23 more
>>
>>
>> After restart we also
>>
>> ​
>> Arseny Kovalchuk
>>
>> Senior Software Engineer at Synesis
>> skype: arseny.kovalchuk
>> mobile: +375 (29) 666-16-16 <+375%2029%20666-16-16>
>> ​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Mime
View raw message