Alexey,

I dug into this a bit more and it is the perfect storm of the way the write behind works and the way we are using one of our Caches.  We need to keep our kafka offsets persisted, so we have a cache with the Key being a topic and partition.  When we get a record from that combination we update the value.  When we are very busy we are constantly getting messages, and the contents of the message gets distributed to many caches, but the offset is to the same cache with the same key.  When that gets flushed to disk the coalesce keeps locking that key, and is in contention with the main thread trying to update the key.  Turning off coalesce does not seem to help, first of all if I am reading the code correctly it is still going to take locks in applyBatch after the call to updateStore and if we have not coalesced we will take the lock on the same value over and over.  Also, because we rewrite that key constantly, without coalesce the write behind cannot keep up.  

Now that we understand what is going on we can work around this.  

Two quick questions:
- We are on 2.1, is there anything changed in this area in 2.3 that might make this better.
- Is this use case of updating the same key unique to us, or is this common enough that there should be a fix to the coalesce code?  

Best,

Larry


On Fri, Nov 3, 2017 at 5:14 PM, Larry Mark <larry.mark@principled.io> wrote:
Alexey,

With our use case setting the coalesce off will probably make it worse, for at least some caches we are doing many updates to the same key, one of the reasons I am setting the batch size to 500.

I will send the cachestore implementation and some logs that show the phenomenon early next week.  Thanks for your help.

Larry 

On Fri, Nov 3, 2017 at 12:11 PM, Alexey Popov <tank2.alex@gmail.com> wrote:
Hi,

Can you share your cache store implementation?

It could be several reasons for possible performance degradation in
write-behind mode.
Ignite can start flushing your cache values in main() thread if cacheSize
becomes greater than 1,5 x setWriteBehindFlushSize. It is a common case but
it does not look like your case.

WriteBehind implementation could use ReentrantReadWriteLock while you
insert/update the Cache entries in your main thread. WriteBehind background
threads use these locks when they read and flush entries.
Such WriteBehind implementation is used when writeCoalescing is turned on by
setWriteBehindCoalescing(true); BTW, the default value is TRUE. Actually, it
makes sense only when you configure several flush threads
(setWriteBehindFlushThreadCount(X)) to have a real concurrency in multiple
reads and writes.

It is hard to believe that it could slow down your main() thread, but please
check: just add setWriteBehindCoalescing(false) to your config and try your
tests again.

Thanks,
Alexey