I dug into this a bit more and it is the perfect storm of the way the write behind works and the way we are using one of our Caches. We need to keep our kafka offsets persisted, so we have a cache with the Key being a topic and partition. When we get a record from that combination we update the value. When we are very busy we are constantly getting messages, and the contents of the message gets distributed to many caches, but the offset is to the same cache with the same key. When that gets flushed to disk the coalesce keeps locking that key, and is in contention with the main thread trying to update the key. Turning off coalesce does not seem to help, first of all if I am reading the code correctly it is still going to take locks in applyBatch after the call to updateStore and if we have not coalesced we will take the lock on the same value over and over. Also, because we rewrite that key constantly, without coalesce the write behind cannot keep up.
Now that we understand what is going on we can work around this.
Two quick questions:
- We are on 2.1, is there anything changed in this area in 2.3 that might make this better.
- Is this use case of updating the same key unique to us, or is this common enough that there should be a fix to the coalesce code?