Just assume that the rows you read in a page all end up in the heap at the same time
If you’re reading 1000 rows of 100 bytes, no big deal, you’ve got 100kb per read thread on the heap
If you’re reading 100 1mb rows, now you’ve got 100MB per thread on the heap
Assuming an 8gb heap with 2gb young gen size, the first example is probably no problem even with dozens of concurrent reads, but the second will trigger a young gc every 10-15 reads (could be promotion, depending on how many concurrent reads you’re doing).
Kurt, thank you very much for your answer! Your remark on GC totally changed my thoughts on cassandra resources usage.
So.. more questions to the respective audience underway.
What is generally considered as
1) "too large" page size,
2)"large" page size
3) "normal conditions" page size?
How exactly fetch size affects CPU? Can too large page size provoke severe CPU usage for constant GC, thus affecting Cassandra performance on read requests (because CPU basically doesn't work on other tasks, while it's constantly GCing)?
Thank you all very much!
1) Am I correct to assume that the larger page size some user session has set - the larger portion of cluster/coordinator node resources will be hogged by the corresponding session?
2) Do I understand correctly that page size (imagine we have no timeout settings) is limited by RAM and iops which I want to hand down to a single user session?
Yes for both of the above. More rows will be pulled into memory simultaneously with a larger page size, thus using more memory and IO.
3) Am I correct to assume that the page size/read request timeout allowance I set is direct representation of chance to lock some node to single user's requests?
Concurrent reads can occur on a node, so it shouldn't "lock" the node to a single users request. However you can overload the node, which may be effectively the same thing. Don't set page sizes too high, otherwise the coordinator of the query will end up doing a lot of GC.