cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict Elliott Smith (Jira)" <>
Subject [jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
Date Mon, 09 Mar 2020 09:18:00 GMT


Benedict Elliott Smith commented on CASSANDRA-15397:

bq. I couldn't do so since the code uses a generic that's comparable

The {{IntervalTree}} is used in precisely one place in the codebase, so it would be possible
to hardcode to this use case for improved performance.

bq.  I'm not sure if assuming long will be a good idea

I would be very surprised if it is not significantly faster.

> IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

> ------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-15397
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Chandrasekhar Thumuluru
>            Assignee: Chandrasekhar Thumuluru
>            Priority: Low
>              Labels: pull-request-available
>         Attachments: 90p_100k_sstables_with_1000_searches.png, 90p_1million_sstables_with_1000_searches.png,
90p_250k_sstables_with_1000_searches.png, 90p_500k_sstables_with_1000_searches.png, 90p_750k_sstables_with_1000_searches.png,
95p_10000_SSTable_with_5000_Searches.png, 95p_100k_sstables_with_1000_searches.png, 95p_15000_SSTable_with_5000_Searches.png,
95p_1million_sstables_with_1000_searches.png, 95p_20000_SSTable_with_5000_Searches.png, 95p_25000_SSTable_with_5000_Searches.png,
95p_250k_sstables_with_1000_searches.png, 95p_30000_SSTable_with_5000_Searches.png, 95p_5000_SSTable_with_5000_Searches.png,
95p_500k_sstables_with_1000_searches.png, 95p_750k_sstables_with_1000_searches.png, 99p_10000_SSTable_with_5000_Searches.png,
99p_100k_sstables_with_1000_searches.png, 99p_15000_SSTable_with_5000_Searches.png, 99p_1million_sstables_with_1000_searches.png,
99p_20000_SSTable_with_5000_Searches.png, 99p_25000_SSTable_with_5000_Searches.png, 99p_250k_sstables_with_1000_searches.png,
99p_30000_SSTable_with_5000_Searches.png, 99p_5000_SSTable_with_5000_Searches.png, 99p_500k_sstables_with_1000_searches.png,
99p_750k_sstables_with_1000_searches.png,,,, Mean_10000_SSTable_with_5000_Searches.png, Mean_100k_sstables_with_1000_searches.png,
Mean_15000_SSTable_with_5000_Searches.png, Mean_1million_sstables_with_1000_searches.png,
Mean_20000_SSTable_with_5000_Searches.png, Mean_25000_SSTable_with_5000_Searches.png, Mean_250k_sstables_with_1000_searches.png,
Mean_30000_SSTable_with_5000_Searches.png, Mean_5000_SSTable_with_5000_Searches.png, Mean_500k_sstables_with_1000_searches.png,
Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, replace_intervaltree_with_intervallist.patch
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
> Cassandra uses IntervalTrees to identify the SSTables that overlap with search interval.
In Cassandra, IntervalTrees are not mutated. They are recreated each time a mutation is required.
This can be an issue during repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided to compare
the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start and end
points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost always performs
similar to IntervalTree or out performs IntervalTree based search. The cost of IntervalTree
construction is also substantial and produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another randomly generated
search interval with 5000 iterations. I'm attaching all the relevant graphs. The x-axis in
the graphs is the search interval coverage. 10p means the search interval covered 10% of the
intervals. The y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data portion
of the interval.  Modified the template version (Java generics) to a specialized version.

> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message