incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <potek...@bnl.gov>
Subject Re: Data Model Design for Login Servie
Date Fri, 18 Nov 2011 01:08:02 GMT
1122: {
           gender: MALE
           birthdate: 1987.11.09
           name: Alfred Tester
           pwd: e72c504dc16c8fcd2fe8c74bb492affa
           alias1: alfred.tester@xyz.de <mailto:alfred.tester@xyz.de>
           alias2: alfred@aad.de <mailto:alfred@aad.de>
           alias3: alf@dd.de <mailto:alf@dd.de>
          }

...and you can use secondary indexes to query on anything.

Maxim


On 11/17/2011 4:08 PM, Maciej Miklas wrote:
> Hallo all,
>
> I need your help to design structure for simple login service. It 
> contains about 100.000.000 customers and each one can have about 10 
> different logins - this results 1.000.000.000 different logins.
>
> Each customer contains following data:
> - one to many login names as string, max 20 UTF-8 characters long
> - ID as long - one customer has only one ID
> - gender
> - birth date
> - name
> - password as MD5
>
> Login process needs to find user by login name.
> Data in Cassandra is replicated - this is necessary to obtain all 
> required login data in single call. Also usually we expect low write 
> traffic and heavy read traffic - round trips for reading data should 
> be avoided.
> Below I've described two possible cassandra data models based on 
> example: we have two users, first user has two logins and second user 
> has three logins
>
> A) Skinny rows
>  - row key contains login name - this is the main search criteria
>  - login data is replicated - each possible login is stored as single 
> row which contains all user data - 10 logins for single customer 
> create 10 rows, where each row has different key and the same content
>
>     // first 3 rows has different key and the same replicated data
> alfred.tester@xyz.de <mailto:alfred.tester@xyz.de> {
>           id: 1122
>           gender: MALE
>           birthdate: 1987.11.09
>           name: Alfred Tester
>           pwd: e72c504dc16c8fcd2fe8c74bb492affa
>         },
> alfred@aad.de <mailto:alfred@aad.de> {
>           id: 1122
>           gender: MALE
>           birthdate: 1987.11.09
>           name: Alfred Tester
>           pwd: e72c504dc16c8fcd2fe8c74bb492affa
>         },
> alf@dd.de <mailto:alf@dd.de> {
>           id: 1122
>           gender: MALE
>           birthdate: 1987.11.09
>           name: Alfred Tester
>           pwd: e72c504dc16c8fcd2fe8c74bb492affa
>         },
>
>     // two following rows has again the same data for second customer
> manfred@xyz.de <mailto:manfred@xyz.de> {
>           id: 1133
>           gender: MALE
>           birthdate: 1997.02.01
>           name: Manfredus Maximus
>           pwd: e44c504ff16c8fcd2fe8c74bb492adda
>         },
> roberrto@xyz.de <mailto:roberrto@xyz.de> {
>           id: 1133
>           gender: MALE
>           birthdate: 1997.02.01
>           name: Manfredus Maximus
>           pwd: e44c504ff16c8fcd2fe8c74bb492adda
>         }
>
> B) Rows grouped by alphabetical prefix
> - Number of rows is limited - for example first letter from login name
> - Each row contains all logins which benign with row key - row with 
> key 'a' contains all logins which begin with 'a'
> - Data might be unbalanced, but we avoid skinny rows - this might have 
> positive performance impact (??)
> - to avoid super columns each row contains directly columns, where 
> column name is the user login and column value is corresponding data 
> in kind of serialized form (I would like to have is human readable)
>
>     a {
> alfred.tester@xyz.de <mailto:alfred.tester@xyz.de>:"1122;MALE;1987.11.09;
>                                  Alfred 
> Tester;e72c504dc16c8fcd2fe8c74bb492affa",
>
>         alfred@aad.de@xyz.de <http://xyz.de>:"1122;MALE;1987.11.09;
>                                  Alfred 
> Tester;e72c504dc16c8fcd2fe8c74bb492affa",
>
>         alf@dd.de@xyz.de <http://xyz.de>:"1122;MALE;1987.11.09;
>                                  Alfred 
> Tester;e72c504dc16c8fcd2fe8c74bb492affa"
>       },
>
>     m {
> manfred@xyz.de <mailto:manfred@xyz.de>:"1133;MALE;1997.02.01;
>                   Manfredus Maximus;e44c504ff16c8fcd2fe8c74bb492adda"
>       },
>
>     r {
> roberrto@xyz.de <mailto:roberrto@xyz.de>:"1133;MALE;1997.02.01;
>                   Manfredus Maximus;e44c504ff16c8fcd2fe8c74bb492adda"
>
>       }
>
> Which solution is better, especially for better read performance? Do 
> you have better idea?
>
> Thanks,
> Maciej


Mime
View raw message