incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Baskar Sikkayan <baskar....@gmail.com>
Subject Re: Project Management
Date Tue, 07 Aug 2012 07:32:32 GMT
Hi,
  Thank you very much for your useful info.
I have one more doubt here.
If i create one more column family based on my query instead of going with
secondary index,
Will it affect the write performance?
Since i need to duplicate the data in the second column family as well
while writing data, Will it hit write performance?

Thanks,
Baskar.S

On Tue, Aug 7, 2012 at 12:18 PM, Roshni Rajagopal <
Roshni.Rajagopal@wal-mart.com> wrote:

> Hi Baskar,
>
> The key aspect here is, you have to think of your queries , and
> denormalize. Here are my suggestions based on my understanding so far.
>
> You seem to have 2 queries
> A) what all users do I have
> B) what organizations do the users belong to
>
> The first can be a static column family- these are similar to RDBMS
> 'master data' or 'dimensions' in the DWH world.
> So you can have a users_CF column family where the row key is the primary
> key- so you can have userid as primary key. For email id as primary key-
> choose something which will never change (natural key vs surrogate key
> debate).
>
> The second query is where the real power of the data model comes in. You
> would not be having a separate organizations table with a foreign key to
> the users table.
> You would have a column family say Oraganizations_Users_CF  with row key
> corresponding to your 'where clause' needs- here organization name. And
> then you can have a dynamic list of user names corresponding to each
> organization as column names.One organization can have 3 users (3 cols)
> another can have 10(10 cols)
> Note it would automatically be sorted by username when you retrieve a row,
> because comparator is Bytetype by default, which works for text sorting.
> If you want some other sort criteria, like say last time logged in, keep
> that as the column name, column value as username. Column names can also
> store some useful information, like a value in itself.
> Sorting is a design time decision.
>
>
> I think there have been numerous posts advising against using secondary
> indexes, so try to keep the key of the col family as what you would be
> searching for, as far as possible.
>
> If you have a different query, you can create a new column family- its ok
> to denormalize and have a separate column family per query.
>
>
> Regards,
> Roshni
>
> On 06/08/12 9:42 PM, "Alain RODRIGUEZ" <arodrime@gmail.com> wrote:
>
> >Cassandra modeling is well documented on the web and a bit too complex
> >to be explained in one mail.
> >
> >I advice you reading a lot before you make modeling choices.
> >
> >You may start with these links :
> >
> >
> http://www.datastax.com/docs/1.1/ddl/about-data-model#comparing-the-cassan
> >dra-data-model-to-a-relational-database
> >
> http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cas
> >sandra/
> >
> >and this link seem interesting, but I haven't read it yet (about indexes)
> >:
> >
> >http://www.anuff.com/2011/02/indexing-in-cassandra.html
> >
> >I hope you'll find your answers within this documentation.
> >
> >Alain
> >
> >
> >2012/8/6 Baskar Sikkayan <baskar.sks@gmail.com>:
> >> Hi,
> >>   Just wanted to learn Cassandra and trying to convert RDBMS design to
> >> Canssandra.
> >> Considered my app is being deployed in multiple Data centers.
> >>
> >> DB Design :
> >>
> >>            A) CF : USER
> >>                   1) email_id - primary key
> >>                   2) fullname
> >>                   3) organization - ( I didnt create a separate table
> >>for
> >> organization )
> >>
> >>            B) CF : ORG_USER
> >>
> >>                  1) organization - Primary Key
> >>                  2) email_id
> >>
> >>                  Here, my intention is to get users belong to an
> >> organization.
> >>                  Here, I can make the organization in the user table as
> >> secondary index, but heard that, this may hit the performance.
> >>                  Could you please clarify me which is the better
> >>approach?
> >>
> >>
> >> Thanks,
> >> Baskar.S
>
> This email and any files transmitted with it are confidential and intended
> solely for the individual or entity to whom they are addressed. If you have
> received this email in error destroy it immediately. *** Walmart
> Confidential ***
>

Mime
View raw message