Hi Tom,

thanks, I take your answer as "nobody else has found an elegant solution, either" :-)

I guess I could use a secondary index for some cases, but there are several reasons I can't use them in most cases. Especially the permissions are problematic.

A user may have dozens of permission for different solutions, and the permissions may have boundary values that limit their applicability. I query for users with any roles for  a specific solution, for users with a specific role within a specific solution, for users with a specific role, solution and boundary value. I have index tables for all of these cases, and since a lot of queries only need users from a specific company I also have index tables where the key also contains the company. Plus, I need to support wild cards in the stored boundary values as well as in the query terms.

With several hundred thousand users in our DB all of this works fine, but maintaining all the index tables is no fun...

We are still on 1.1.8, preparing to switch to 1.2. As far as I know, 2.0 is not yet recommended for production. And while having transaction would definitely be nice, most cases of inconsistent data result from bugs in the code or changed logic.

The data we keep in Cassandra is just a denormalized copy of part of our data in an Oracle DB, so theoretically we could just throw everything away and rebuild. But we can't afford to be down that long.
I have a program that scans the data in our Cassandra DB for inconsistencies, and spits our CQL code to fix it. One complete check alone takes about 2 days.

Anyway, thanks for your help. :-)

Thomas


On Tue, Nov 5, 2013 at 11:00 PM, Tom van den Berge <tom@drillster.com> wrote:
Hi Thomas,

I understand your concerns about ensuring the integrity of your data when having to maintain the indexes yourself. 

In some situations, using Cassandra's built in secondary indexes is more efficient -- when many rows contained the indexed value. Maybe your permissions fall in this category? Obviously, the advantage is that Cassandra will do the maintenance on the index for you.

For situations where secondary indexes are not recommended, you make your life a lot easier if all modifications of the indexed entity (like your user) is executed by one single piece of code, which is then also responsible for maintaining all associated indexes. And write tests to ensure that it works in all possible ways.

I understood that Cassandra 2.0 supports transactions. I haven't looked at it yet, but this could also help maintaining your data integrity, when a failed update of one of your indexes results in a rollback of the entire transaction.

I hope this is helpful to you.
Tom


On Mon, Nov 4, 2013 at 12:20 PM, Thomas Stets <thomas.stets@gmail.com> wrote:
What is the best way to manage index tables on update/deletion of the indexed data?

I have a table containing all kinds of data fora user, i.e. name, address, contact data, company data etc. Key to this table is the user ID.

I also maintain about a dozen index tables matching my queries, like name, email address, company D.U.N.S number, permissions the user has, etc. These index tables contain the user IDs matching the search key as column names, with the column values left empty.

Whenever a user is deleted or updated I have to make sure to update the index tables, i.e. if the permissions of a user changes I have to remove the user ID from the rows matching the permission he no longer has.

My problem is to find all matching entries, especially for data I no longer have.

My solution so far is to keep a separate table to keep track of all index tables and keys the user can be found in. In the case mentioned I look up the keys for the permissions table, remove the user ID from there, then remove the entry in the keys table.

This works so far (in production for more than a year and a half), and it also allows me to clean up after something has gone wrong.

But still, all this additional level of meta information adds a lot of complexity. I was wondering wether there is some kind of pattern that addresses my problem. I found lots of information saying that creating the index tables is the way to go, but nobody ever mentions maintaining the index tables.

tia, Thomas



--

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com