incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Modeling Multi-Valued Fields
Date Thu, 10 Mar 2011 21:15:48 GMT
Two approaches here.

First the "many columns" approach. Have a super column called Email, for each email address
store the type as the column name and the email address as the column name. In cassandra you
can store information in the column names as well as the column values. And you do not need
to know the column names to read them back, see get_slice() on the API http://wiki.apache.org/cassandra/API


Slight variation is to use a standard CF and pack the column names, e.g. "email.home"  or
"email.work" as column names. (Mentioned for completeness, not the best approach)

Second the "few columns" approach. Pack all the email addresses for the customer into something
like a JSON document and store that in one field, using a standard CF for the user. 

Slight variation is to pack almost everything for the User into a JSON doc and store that.


If you are always pulling back all the data for the user, and you will always want to update
all the data at once then consider trying the second approach. Otherwise try the first. 

IMHO it's better to pull back a bit more data than is needed to the client (e.g. all their
data or all their email addresses), than it is to optimize to read just one particular field.
The overall goal here is to optimize your storage model to support read requests, even if
it means duplication and de-normalisation. 

Hope that helps. 
Aaron

 
On 10 Mar 2011, at 14:43, Cameron Leach wrote:

> Is there a best-practice for modeling multi-valued fields (fields that are repeated or
collections of fields)? Our current data model allows for a User to store multiple email addresses:
> 
> User {
>   Integer id; //row key  
>   List<Email> emails;
> 
>   Email {
>     String type; //home, work, gmail, hotmail, etc...
>     String address;
>   }
> }
> 
> So if I setup a 'User' column family with an 'Email' super column, how would one support
multiple email addresses, storing values for the 'type' and 'address' column names? I've seen
it suggested to have dynamic column names, but this doesn't seem practical, unless someone
can make it more clear how that strategy would work.
> 
> Thanks!
> 
> 


Mime
View raw message