incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lohfink <clohf...@blackbirdit.com>
Subject Re: Doubt
Date Tue, 22 Apr 2014 18:19:12 GMT
Generally Ive seen it recommended to do a composite CF since it gives you more flexibility
and its easier to debug.  You can get some performance improvements by storing a serialized
blob (a lot of data you can represent much smaller this way by factor of 10 or more if clever)
to represent your entity but the complexity is rarely worth it.  It is likely a premature
optimization but I have seen cases its shown a good improvement.

either case, the data will ultimately be read sequentially from disk per sstable (normal bottleneck)
so the only benefit you gain is 
- potentially disk space (if serialization is efficient) and network bandwidth
- Cassandra won’t have to deserialize as many columns, but I’m fairly certain this is
utterly irrelevant
- if stored in a mechanism that you can deserialize efficiently (like protobufs) it can make
a big difference on your app side

keep in mind if serializing data though you will have to always maintain code that will be
able to read old versions, it can become very complex and lead to weird bugs.

---
Chris Lohfink

On Apr 21, 2014, at 3:53 AM, Jagan Ranganathan <jagan@zohocorp.com> wrote:

> Dear All,
> 
> We have a requirement to store 'N' columns of an entity in a CF. Mostly this is write
once and read many times. What is the best way to store the data?
> Composite CF
> Simple CF with value as protobuf extracted data
> Both provides extendable columns which is a requirement for our usage. 
> 
> But I want to know which one is efficient, assuming there is bound to be say 5% of updates?
> 
> Regards,
> Jagan


Mime
View raw message