Hi Thomas,

I understand your concerns about ensuring the integrity of your data when having to maintain the indexes yourself. 

In some situations, using Cassandra's built in secondary indexes is more efficient -- when many rows contained the indexed value. Maybe your permissions fall in this category? Obviously, the advantage is that Cassandra will do the maintenance on the index for you.

For situations where secondary indexes are not recommended, you make your life a lot easier if all modifications of the indexed entity (like your user) is executed by one single piece of code, which is then also responsible for maintaining all associated indexes. And write tests to ensure that it works in all possible ways.

I understood that Cassandra 2.0 supports transactions. I haven't looked at it yet, but this could also help maintaining your data integrity, when a failed update of one of your indexes results in a rollback of the entire transaction.

I hope this is helpful to you.

On Mon, Nov 4, 2013 at 12:20 PM, Thomas Stets <thomas.stets@gmail.com> wrote:
What is the best way to manage index tables on update/deletion of the indexed data?

I have a table containing all kinds of data fora user, i.e. name, address, contact data, company data etc. Key to this table is the user ID.

I also maintain about a dozen index tables matching my queries, like name, email address, company D.U.N.S number, permissions the user has, etc. These index tables contain the user IDs matching the search key as column names, with the column values left empty.

Whenever a user is deleted or updated I have to make sure to update the index tables, i.e. if the permissions of a user changes I have to remove the user ID from the rows matching the permission he no longer has.

My problem is to find all matching entries, especially for data I no longer have.

My solution so far is to keep a separate table to keep track of all index tables and keys the user can be found in. In the case mentioned I look up the keys for the permissions table, remove the user ID from there, then remove the entry in the keys table.

This works so far (in production for more than a year and a half), and it also allows me to clean up after something has gone wrong.

But still, all this additional level of meta information adds a lot of complexity. I was wondering wether there is some kind of pattern that addresses my problem. I found lots of information saying that creating the index tables is the way to go, but nobody ever mentions maintaining the index tables.

tia, Thomas


Drillster BV
Middenburcht 136
3452MT Vleuten

+31 30 755 5330

Open your free account at www.drillster.com