cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8931) IndexSummary (and Index) should store the token, and the minimal key to unambiguously direct a query
Date Fri, 13 Nov 2015 14:54:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004064#comment-15004064
] 

Robert Stupp commented on CASSANDRA-8931:
-----------------------------------------

Just a raw idea, but maybe we do not need index-summaries at all with this ticket (assuming
murmur3).

> IndexSummary (and Index) should store the token, and the minimal key to unambiguously
direct a query
> ----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8931
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8931
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>              Labels: performance
>             Fix For: 3.x
>
>
> Since these files are likely sticking around a little longer, it is probably worth optimising
them. A relatively simple change to Index and IndexSummary could reduce the amount of space
required significantly, reduce the CPU burden of lookup, and hopefully bound the amount of
space needed as key size grows. On writing first we always store the token before the key
(if it is different to the key); then we simply truncate the whole record to the minimum length
necessary to answer an inequality search. Since the data file contains the key also, we can
corroborate we have the right key once we've looked up. Since BFs are used to reduce unnecessary
lookups, we don't save much by ruling the false positives out one step earlier. 
>  An improved follow up version would be to use a trie of shortest length to answer inequality
lookups, as this would also ensure very long keys with common prefixes would not significantly
increase the size of the index or summary. This would translate to a trie index for the summary
keying into a static trie page for the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message