cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <JEREMIAH.JOR...@morningstar.com>
Subject RE: understanding of native indexes: limitations, potential side effects,...
Date Wed, 16 May 2012 16:23:51 GMT
The limitation is because number of columns could be equal to number of rows.  If number of
rows is large this can become an issue.

-Jeremiah

________________________________
From: David Vanderfeesten [feestend@gmail.com]
Sent: Wednesday, May 16, 2012 6:58 AM
To: user@cassandra.apache.org
Subject: understanding of native indexes: limitations, potential side effects,...

Hi

I like to better understand the limitations of native indexes, potential side effects and
scenarios where they are required.

My understanding so far :
- Is that indexes on each node are storing indexes for data locally on the node itself.
- Indexes do not return values in a sorted way (hashes of the indexed row keys are defining
the order)
- Given by the design referred in the first bullet, a coordinator node receiving a read of
a native index, needs to spawn a read to multiple nodes(set of nodes together covering at
least the complete key space + potentially more to assure read consistency level).
- Each write to an indexed column leads to an additional local read of the index to update
the index (kind of obvious but easily forgotten when tuning your system for write-only workload)
- When using a where clause in CQL you need at least to specify an equal condition on a native
indexed column. Additional conditions in the where clause are filtered out by the coordinator
node receiving the CQL query.
- native indexes do not support very well columns with high number of discrete values throughout
the entire CF.

Is upper understanding correct and complete?
Some doubts:
- about the limitation of indexing columns with high number of discrete values:
I assume native indexes  are implemented with an internally managed CF per index. With high
cardinality values, in worst case, the number of rows in the index are identical to the number
of rows of the indexed CF. Or are there other reasons for the limitation, and if that's the
case, is there a guideline on the max. nbr of cardinality that is still reasonable?
-Are column updates and the update of the indexes (read + write action) atomic and isolated
from concurrent updates?

Txs!

David





Mime
View raw message