ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislav Lukyanov <stanlukya...@gmail.com>
Subject RE: Question about data distribution
Date Mon, 29 Jan 2018 09:13:22 GMT
AFAIU you simultaneously have about 1 million entries which correspond to 5-10 groups (measurements).
Is that correct?
If so, that might be the reason of the distribution that you see. Even though you have a lot
of entries,
you only have a few affinity mapped groups. Ignite tries keep all entries of the same measurement
and each measurement has ~50% chance to end up on the first or the second node. For 5 measurements,
it results
in a ~3% chance that all data will be stored on a single node.
To have a better distribution among the cluster, try to restructure your data to have more
distinct values
of the affinity mapped IDs. For example, you could split each measurement in, say, 1k or 10k
batches, assign each
batch an ID and make that batch ID @AffinityKeyMapped instead of the measurementId.


From: svonn
Sent: 27 января 2018 г. 15:46
To: user@ignite.apache.org
Subject: RE: Question about data distribution


My class for my keys looks like this.

private String deviceId;

    private long measurementId;

    private long timestamp;

    public IgniteKey(String deviceId, long measurementId, long timestamp) {
        this.deviceId = deviceId;
        this.measurementId = measurementId;
        this.timestamp = timestamp;


One device can have multiple measurements, but any calculation only requires
other entries from the same measurement as of now, thus only the
measurementId should be relevant.

One measurement contains 100k - 200k entries in one stream, and 500-1000 in
the other stream. Both streams use the same class for keys.

Whenever a new measurementId arrives I'm doing some output on the node it's
being processed on - I've had following case:
Measurement 1 (short M1) -> node1
M2 -> node1
M3 -> node2
M4 -> node1
M5 -> node1
M6 -> node1

I expected that even M2 will already be placed on node2 - however,
performance wise, I don't think either node is close to it's limit, I'm not
sure if that also relevant.
Due to the 5min expiry policy I can end up with one node having ~1 million
cache entries while the other one has 0.

- svonn

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

View raw message