ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roger Fischer (CW)" <rfis...@Brocade.com>
Subject Missing object when loading from Cassandra with multiple queries
Date Tue, 25 Jul 2017 22:46:06 GMT

I am using Ignite with Cassandra, loading data from Cassandra on-demand using multiple query
statements.  But only a (seemingly random) subset of the rows/objects are loaded into ignite.

When I load using a single query, all rows/objects are loaded correctly.

In another environment, the same data, config and code loads correctly with multiple queries.
The main difference is that this environment uses older, slower computers.

When I repeat the same load request multiple times, each repetition adds a few more rows/objects,
until eventually all the rows/objects are loaded into the cache.

The environment where it works (all matching rows are loaded into the cache) is a cluster
of 3 old desktop machines with 2 cores each. The same 3 nodes also run Cassandra.

The environment where only a part of the rows are loaded is a cluster of 3 modern serves (VMs)
with 8 cores. The same 3 nodes also run Cassandra.

The only theory I have at this time is that with more cores, more queries / inserts are executed
in parallel and something goes wrong with that higher level of parallelism.

I am calling loadCache( null, String[]);. The string array has 7 queries, one for each partition
in Cassandra.

I have verified the queries; they return the correct rows when executed in cqlsh.

There are no error, warning or info logs during the load, neither from the client nor from
the 3 servers.

I turned on additional logging in the environment that has the problem. Because I am loading
15K rows, there are thousands of logs, and an analysis is difficult. However, the following
logs happen to stand out:

[19:07:45,946][FINE][pool-12-thread-4][GridQueryProcessor] Store [space=FcPortStatsCache,
key=com.abc.poc.icpoc.model.FcPortStatsKey [idHash=1690890517, hash=-725839099, hour=Fri Jul
07 15:00:00 UTC 2017, bucket=3, dateTime=Fri Jul 07 15:09:00 UTC 2017, portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c],
val=com.abc.poc.icpoc.model.FcPortStats [idHash=117595137, hash=-1000695178, portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c,
dateTime=Fri Jul 07 15:09:00 UTC 2017, portType=1, switchId=339c5bcd-b503-4d54-948e-7ae6c40f31c2,
rxUtil=78.479034, txUtil=91.488396, higherUtil=91.488396, lowerUtil=78.479034, rxRate=18411.0,
txRate=15424.0, higherRate=18411.0, lowerRate=15424.0, c3Discards=0.0, crcErrors=0.0]]

There are 15,414 of these logs. Only 5627 objects were loaded into the cache. 15,000 rows
match the queries.

[19:07:45,944][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> Remove
will not be done for key (entry got replaced or removed): com.abc.poc.icpoc.model.FcPortStatsKey
[idHash=326667456, hash=627195288, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri
Jul 07 15:09:00 UTC 2017, portId=00159ca0-a6c9-47e5-a1f6-f3fe12941ba1]

There are exactly the same number of these (15,414).

[19:07:45,945][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> Ignoring
entry for partition that does not belong [key=com.abc.poc.icpoc.model.FcPortStatsKey [idHash=235959795,
hash=1960141096, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri Jul 07 15:09:00
UTC 2017, portId=001648ce-5410-472b-ac32-5b8056b30674], val=FcPortStats: 001648ce-5410-472b-ac32-5b8056b30674,
Fri Jul 07 15:09:00 UTC 2017, 1; 53498d64-e53f-4136-a0db-7be9e200cf84 ..., err=class org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtInvalidPartitionException
[part=509, msg=Creating partition which does not belong to local node (often may be caused
by inconsistent 'key.hashCode()' implementation) [part=509, topVer=AffinityTopologyVersion
[topVer=-1, minorTopVer=0], this.topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0]]]]

There are 5685 of these logs, way less than the number of missing objects.
The message suggest that the hash code for the key may not be good enough. But would that
not also apply in the other (slow, but good) environment, or when loading a single query?

ver. 2.0.0#20170430-sha1:d4eef3c6

Any suggestions?



View raw message