incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Gude <>
Subject AW: Multi-type column values in single CF
Date Sun, 03 Jul 2011 13:07:41 GMT
You could do the serialization for all your supported datatypes yourself (many libraries for
serialization are available and a pretty thorough benchmarking for them can be found here: and prepend the serialized bytes with an identifier
for your datatype.
This would not avoid casting though but would still be better performing then serializing
to strings as it is done in your example.
Prepending the values with the id seems to be better to me, because you can be sure that a
new insertion to some field overwrites the correct column even if it changed the type.

-----Urspr√ľngliche Nachricht-----
Von: osishkin osishkin [] 
Gesendet: Sonntag, 3. Juli 2011 13:52
Betreff: Multi-type column values in single CF

Hi all,

I need to store column values that are of various data types in a
single column family, i.e I have column values that are integers,
others that are strings, and maybe more later. All column names are
strings (no comparator problem for me).
The thing is I need to store unstructured data - I do not have fixed
and known-in-advacne column names, so I can not use a fixed static map
for casting the values back to their original type on retrieval from

My immediate naive thought is to simply prefix every column name with
the type the value needs to be cast back to.
For example i'll do the follwing conversion to the columns of some key -
{'attr1': 'val1','attr2': 100}  ~> {'str_attr1' : 'val1', 'int_attr2' : '100'}
and only then send it to cassandra. This way I know to what should I
cast it back.

But all this casting back and forth on the client side seems to me to
be very bad for performance.
Another option is to split the columns on dedicated column families
with mathcing validation types - a column family for integer values,
one for string, one for timestamp etc.
But that does not seem very efficient either (and worse for any
rollback mechanism), since now I have to perform several get calls on
multiple CFs where once I had only one.

I thought perhaps someone has encountered a similar situation in the
past, and can offer some advice on the best course of action.

Thank you,

View raw message