cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Singh <>
Subject Re: Data model storage optimization
Date Sun, 29 Jul 2018 23:03:39 GMT
How many rows in average per partition?

Let me get this straight : You are bifurcating your partitions on either email or username
, essentially potentially doubling the data because you don’t have a way to manage a central
system of record of users ?

I would do this: (my opinion)
Migrate to a single sign on System that uses one or the other. Map and migrate your data to
use a singular record as “identity”.

I know that seems painful but I _hate_ perpetuating bad design because someone , in the past,
presence , or future chooses to not solve the problem but get around it.

This is not a storage optimization problem - it’s a data architecture problem.

On Jul 28, 2018, 3:11 AM -0400, onmstester onmstester <>, wrote:
> The current data model described as table name: ((partition_key),cluster_key),other_column1,other_column2,...
> user_by_name: ((time_bucket, username)),ts,request,email
> user_by_mail: ((time_bucket, email)),ts,request,username
> The reason that all 2 keys (username, email) repeated in all tables is that there may
be different username with the same email or different email with same username, and the query
for data model is:
> 1.  username = X
> 2. mail=Y
> 3. username = X and mail= Y (we query one of tables and because there is small number
of records in result, we filter the other column)
> This data model results in wasting lots of storage.
> I thought using UUID or hash code or sequence to handle this but i can't keep track of
the old vs new records (the ones that already have UUID).
> Any recommendation on optimizing data model to save storage?
> Sent using Zoho Mail

View raw message