cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index
Date Sat, 14 Jul 2012 23:44:34 GMT


Jason Rutherglen updated CASSANDRA-4324:

    Attachment: lucene-core-4.0-SNAPSHOT.jar

FSTMemUsage compares the memory usage of the FST vs. IndexSummary.  

On 1 million keys these are the results:

FST: 39,032,383 bytes
IndexSummary: 43,996,068 bytes

A difference of about 4 megabytes.  FST w would be far smaller if the MD5 hash was not being
applied to the key, eg, it does best to with keys that are sequential so that prefix compression
may be applied.

To run FSTMemUsage, the lucene-core-4.0-SNAPSHOT.jar needs to be added to the lib/ directory.

The patch was generated using 'git diff HEAD~1..HEAD' because 'git diff' after 'git add' did
not work.
> Implement Lucene FST in for key index
> -------------------------------------
>                 Key: CASSANDRA-4324
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jason Rutherglen
>            Assignee: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.2
>         Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar
> The Lucene FST data structure offers a compact and fast system for indexing Cassandra
keys.  More keys may be loaded which in turn should seeks faster.
> * Update the IndexSummary class to make use of the Lucene FST, overriding the serialization
> * Alter SSTableReader to make use of the FST seek mechanism

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message