cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra Secondary index/Twissandra
Date Sun, 10 Jul 2011 00:26:30 GMT
> Is there a limit on the number of columns in a single column family that serve as secondary
indexes? 
AFAIK there is no coded limit, however every index is implemented as another (hidden) Column
Family that inherits the settings of the parent CF. So under 0.7 you may run out of memory,
under 0.8 you may flush  a lot. Also, when an indexed column is updated there are potentially
3 operations that have to happen: read the old value, delete the old value, write the new
value. More indexes == more index updating, just like any other database. 
> Does performance decrease (significantly) if the uniqueness of the column’s values
is high?
Low cardinality is recommended
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html

> The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE" and not TimeUUID?
Probably just to make the demo easier. It's used to order tweets in the user and public timelines
by the current time 
https://github.com/twissandra/twissandra/blob/master/cass.py#L204

> Does performance decrease (significantly) if the uniqueness of the column’s name is
high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns?
Depends on what sort of operations you are doing. Some read operations have to pay a constant
cost to decode the row level column index, this can be tuned though. AFAIK the comparator
type has very little to do with the performance. 

Hope that helps. 

-----------------
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jul 2011, at 12:15, Eldad Yamin wrote:

> Hi,
> I have few questions:
> 
> Secondary index
> Is there a limit on the number of columns in a single column family that serve as secondary
indexes? 
> Does performance decrease (significantly) if the uniqueness of the column’s values
is high?
> 
> Twissandra
> Why in the source (or any tutorial I've read):
> The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE" and not TimeUUID?
> https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
> Does performance decrease (significantly) if the uniqueness of the column’s name is
high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns?
> 
> Thanks!
> Eldad


Mime
View raw message