cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] Created: (CASSANDRA-1601) Refactor index definitions
Date Mon, 11 Oct 2010 05:48:32 GMT
Refactor index definitions

                 Key: CASSANDRA-1601
             Project: Cassandra
          Issue Type: Improvement
          Components: API
            Reporter: Stu Hood
            Priority: Critical
             Fix For: 0.7.0

h3. Overview
There are a few considerations for defining secondary indexes and row validation that I don't
think have been brought up yet. While the interface is still malleable pre 0.7.0, we should
attempt to make changes that allow for forwards compatibility of index/validator schemas.
This is an umbrella ticket for suggesting/debating the changes: other tickets should be opened
for quick improvements that can be made before 0.7.0.


h3. Index output types
The output (queryable) data from an indexing operation is what actually goes in the index.
For a particular row, the output can be either _single-valued_, _multi-valued_ or _compound_:
* Single-valued
** Implemented in trunk (special case of multi-valued)
* Multi-valued
** Multiple index values _of the same type_ can match a single row
** Row probably contains a list/set (perhaps in a supercolumn)
* Compound
** Multiple base properties concatenated as one index entry 
** Different validators/comparators for each component
** (Given the simplicity of performing boolean operations on 1472 indexes, compound local
indexes are unlikely to ever be worthwhile, but compound distributed indexes will be: see
comments on CASSANDRA-1599)

h3. Index input types
The other end of indexing is selection of values from a row to be indexed. Selection can correspond
directly to our current {{db.filter.*}} implementations, and may be best implemented by specifying
the validator/index using the same Thrift objects you would use for a similar query:
* Name selection
** Implemented in trunk, but should probably just be a special case of list selection below
** Corresponds to db.filter.NamesQueryFilter of size 1
* List selection
** Should specify a list of columns of which all values must be of the same type, as defined
by the Validator
** Corresponds to db.filter.NamesQueryFilter
* Range (prefix?) selection
** Subsets of a row may be interesting for indexing
** Range corresponds to db.filter.SliceQueryFilter
*** (A Prefix might actually be more useful for indexing, but is better implemented by indexing
an arbitrarily nested row)
** Open question: might the ability to index only the 'top N values' from a row be useful?
If so, then this selector should allow N to be specified like it would be for a slice

h3. Supercolumns/arbitrary-nesting
Another consideration is that we should be able to support indexing and validation of supercolumns
(and hence, arbitrarily nested rows). Since the selection of columns to index is essentially
the same as the selection of columns to return for a query, this can probably mirror (and
suggest improvements to) our query API.

h3. UDFs
This is obviously still an open area, but user defined indexing functions are essentially
a transform between the _input_ and _output_ (as defined above), which would normally have
equal structures. Leaving room for UDFs in our index definitions makes sense, and will likely
lead to a much more general and elegant design.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message