cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Valle (BLOOMBERG/ LONDON)" <>
Subject Re: to normalize or not to normalize - read penalty vs write penalty
Date Wed, 04 Feb 2015 18:24:36 GMT
Perfect Tyler.

My feeling was leading me to this, but I wasn't being able to put it in words as you did.

Thanks a lot for the message.

Subject: Re: to normalize or not to normalize - read penalty vs write penalty

Okay.  Let's assume with denormalization you have to do 1000 writes (and one read per user)
and with normalization you have to do 1 write (and maybe 1000 reads for each user).

If you execute the writes in the most optimal way (batched by partition, if applicable, and
separate, concurrent requests per partition), I think it's reasonable to say you can do 1000
writes in 10 to 20ms.

Doing 1000 reads is going to take longer.  Exactly how long depends on your systems (SSDs
or not, whether the data is cached, etc).  But this is probably going to take at least 2x
as long as the writes. 

So, with denormalization, it's 10 to 20ms for all users to see the change (with a median somewhere
around 5 to 10ms).  With normalization, all users *could* see the update almost immediately,
because it's only one write.  However, each of your users needs to read 1000 partitions, which
takes, say 20 to 50ms.  So effectively, they won't see the changes for 20 to 50ms, unless
they know to read the details for that exact alert.

On Wed, Feb 4, 2015 at 11:57 AM, Marcelo Valle (BLOOMBERG/ LONDON) <>

I don't want to optimize for reads or writes, I want to optimize for having the smallest gap
possible between the time I write and the time I read.

Subject: Re: to normalize or not to normalize - read penalty vs write penalty

Roughly how often do you expect to update alerts?  How often do you expect to read the alerts?
 I suspect you'll be doing 100x more reads (or more), in which case optimizing for reads is
the definitely right choice.

On Wed, Feb 4, 2015 at 9:50 AM, Marcelo Valle (BLOOMBERG/ LONDON) <>

Hello everyone,

I am thinking about the architecture of my application using Cassandra and I am asking myself
if I should or shouldn't normalize an entity.

I have users and alerts in my application and for each user, several alerts. The first model
which came into my mind was creating an "alerts" CF with user-id as part of the partition
key. This way, I can have fast writes and my reads will be fast too, as I will always read
per partition.

However, I received a requirement later that made my life more complicated. Alerts can be
shared by 1000s of users and alerts can change. I am building a real time app and if I change
an alert, all users related to it should see the change. 

Suppose I want to keep thing not normalized - always an alert changes I would need to do a
write on 1000s of records. This way my write performance everytime I change an alert would
be affected. 

On the other hand, I could have a CF for users-alerts and another for alert details. Then,
at read time, I would need to query 1000s of alerts for a given user.

In both situations, there is a gap between the time data is written and the time it's available
to be read. 

I understand not normalizing will make me use more disk space, but once data is written once,
I will be able to perform as many reads as I want to with no penalty in performance. Also,
I understand writes are faster than reads in Cassandra, so the gap would be smaller in the
first solution.

I would be glad in hearing thoughts from the community.

Best regards,
Marcelo Valle.

Tyler Hobbs

Tyler Hobbs

View raw message