cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yasuharu Goto (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-12731) Remove IndexInfo cache from FileIndexInfoRetriever.
Date Thu, 29 Sep 2016 16:26:20 GMT
Yasuharu Goto created CASSANDRA-12731:
-----------------------------------------

             Summary: Remove IndexInfo cache from FileIndexInfoRetriever.
                 Key: CASSANDRA-12731
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12731
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Yasuharu Goto


Hi guys.
In the patch of CASSANDRA-11206 , I found that FileIndexInfoRetriever allocates a very large
IndexInfo array (up to the number of IndexInfo in the RowIndexEntry has) as a cache in every
single read path.

After some experiments using LargePartitionTest , I got results that show that removing FileIndexInfoRetriever
improves the performance for large partitions like below (latencies reduced by 41% and by
45%).

{noformat}
// LargePartitionsTest.test_13_4G with cache by array
INFO  [main] 2016-09-29 23:11:25,763 ?:? - SELECTs 1 for part=4194304k total=16384M took 94197
ms
INFO  [main] 2016-09-29 23:12:50,914 ?:? - SELECTs 2 for part=4194304k total=16384M took 85151
ms

// LargePartitionsTest.test_13_4G without cache
INFO  [main] 2016-09-30 00:13:26,050 ?:? - SELECTs 1 for part=4194304k total=16384M took 55112
ms
INFO  [main] 2016-09-30 00:14:12,132 ?:? - SELECTs 2 for part=4194304k total=16384M took 46082
ms
{noformat}

Code is [here|https://github.com/matope/cassandra/commit/86fb910a0e38f7520e1be40fb42f74a692f2ebce]
(based on trunk)

Of course, I have attempted to use some collection containers instead of a plain array. But
I could not recognize great improvement enough to justify using these cache mechanism by them.
(Unless I did some mistake or overlook about this test)

|| LargePartitionsTest.test_12_2G || SELECTs 1 (ms) || SELECTs 2 (ms) || Scan (ms) ||
|Original (array) | 62736 | 48562 | 41540 |
|ConcurrentHashMap 1st| 47597 | 30854 | 18271 |
|ConcurrentHashMap 2nd|44036|26895|17443|
|LinkedHashCache (capacity=16, limit=10, fifo) 1st|42668|32165|17323|
|LinkedHashCache (capacity=16, limit=10, fifo) 2nd|48863|28066|18053|
|LinkedHashCache (capacity=16, limit=16, fifo) | 46979 | 29810 | 18620 |
|LinkedHashCache (capacity=16, limit=10, lru) | 46456 | 29749 | 20311 |
|No Cache 1st | 47579 | 32480 | 18337 |
|No Cache 2nd | 46534 | 27670 | 18700 |

Code that I used for this comparison is [here|https://github.com/matope/cassandra/commit/e12fcac77f0f46bdf4104ef21c6454bfb2bb92d0].
LinkedHashCache is a simple fifo/lru cache that is extended by LinkedHashMap.
Scan is a execution time to iterate through the large partition.

So, In this issue, I'd like to propose to remove IndexInfo cache from FileIndexInfoRetriever
to improve the performance on large partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message