cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Norberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4710) High key hashing overhead for index scans when using RandomPartitioner
Date Tue, 25 Sep 2012 02:16:07 GMT


Daniel Norberg commented on CASSANDRA-4710:

The check against DatabaseDescriptor.getIndexInterval is to be able to exit the loop in case
the key looked for is not present in the index. 

When doing token comparison the loop can be exited when an index entry whose token is greater
than the needle is encountered as the index is sorted on token. I.e. the if (v < 0) return
null. But when doing raw key comparison we have to look through every entry in the section
of the index that the sampled index gave us to be able to know that a key was not present.
Fortunately this should be rare as key presence is checked using the bloom filter for EQ lookups
before reading the index.
> High key hashing overhead for index scans when using RandomPartitioner
> ----------------------------------------------------------------------
>                 Key: CASSANDRA-4710
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Daniel Norberg
>         Attachments: 0001-SSTableReader-compare-raw-key-when-scanning-index.patch
> For a workload where the dataset is completely in ram, the md5 hashing of the keys during
index scans becomes a bottleneck for reads when using RandomPartitioner, according to profiling.
> Instead performing a raw key equals check in SSTableReader.getPosition() for EQ operations
improves throughput by some 30% for my workload (moving the bottleneck elsewhere).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message