cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-2915) Lucene based Secondary Indexes
Date Mon, 18 Jul 2011 18:05:57 GMT
Lucene based Secondary Indexes

                 Key: CASSANDRA-2915
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: T Jake Luciani
             Fix For: 1.0

Secondary indexes (Type KEYS) currently suffer from a number of limitations in their current

   - Multiple IndexClauses only work when there is a subset of rows under the highest clause
   - One new column family is created per index this means 10 new CFs for 10 secondary indexes

This ticket will use the Lucene library to implement secondary indexes as one index per CF,
and utilize the Lucene query engine to handle multiple index clauses. Also, by using the Lucene
we get a highly optimized file format.

There are a few parallels we can draw between Cassandra and Lucene.

Lucene indexes segments in memory then flushes them to disk so we can sync our memtable flushes
to lucene flushes. Lucene also has optimize() which correlates to our compaction process,
so these can be sync'd as well.

We will also need to correlate column validators to Lucene tokenizers, so the data can be
stored properly, the big win in once this is done we can perform complex queries within a
column like wildcard searches.

The downside of this approach is we will need to read before write since documents in Lucene
are written as complete documents. For random workloads with lot's of indexed columns this
means we need to read the document from the index, update it and write it back.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message