Thank you very much for your useful info.
I have one more doubt here.
If i create one more column family based on my query instead of going with secondary index,
Will it affect the write performance?
Since i need to duplicate the data in the second column family as well while writing data, Will it hit write performance?
The key aspect here is, you have to think of your queries , and
denormalize. Here are my suggestions based on my understanding so far.
You seem to have 2 queries
A) what all users do I have
B) what organizations do the users belong to
The first can be a static column family- these are similar to RDBMS
'master data' or 'dimensions' in the DWH world.
So you can have a users_CF column family where the row key is the primary
key- so you can have userid as primary key. For email id as primary key-
choose something which will never change (natural key vs surrogate key
The second query is where the real power of the data model comes in. You
would not be having a separate organizations table with a foreign key to
the users table.
You would have a column family say Oraganizations_Users_CF with row key
corresponding to your 'where clause' needs- here organization name. And
then you can have a dynamic list of user names corresponding to each
organization as column names.One organization can have 3 users (3 cols)
another can have 10(10 cols)
Note it would automatically be sorted by username when you retrieve a row,
because comparator is Bytetype by default, which works for text sorting.
If you want some other sort criteria, like say last time logged in, keep
that as the column name, column value as username. Column names can also
store some useful information, like a value in itself.
Sorting is a design time decision.
I think there have been numerous posts advising against using secondary
indexes, so try to keep the key of the col family as what you would be
searching for, as far as possible.
If you have a different query, you can create a new column family- its ok
to denormalize and have a separate column family per query.
This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
On 06/08/12 9:42 PM, "Alain RODRIGUEZ" <email@example.com> wrote:
>Cassandra modeling is well documented on the web and a bit too complex
>to be explained in one mail.
>I advice you reading a lot before you make modeling choices.
>You may start with these links :
>and this link seem interesting, but I haven't read it yet (about indexes)
>I hope you'll find your answers within this documentation.
>2012/8/6 Baskar Sikkayan <firstname.lastname@example.org>:
>> Just wanted to learn Cassandra and trying to convert RDBMS design to
>> Considered my app is being deployed in multiple Data centers.
>> DB Design :
>> A) CF : USER
>> 1) email_id - primary key
>> 2) fullname
>> 3) organization - ( I didnt create a separate table
>> organization )
>> B) CF : ORG_USER
>> 1) organization - Primary Key
>> 2) email_id
>> Here, my intention is to get users belong to an
>> Here, I can make the organization in the user table as
>> secondary index, but heard that, this may hit the performance.
>> Could you please clarify me which is the better