cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eldad Yamin <elda...@gmail.com>
Subject Re: Cassandra Secondary index/Twissandra
Date Sun, 10 Jul 2011 07:14:57 GMT
Aaron - Thank you for the fast response!


   1. Does performance decrease (significantly) if the uniqueness of the
   column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
   lots of columns?

>Depends on what sort of operations you are doing. Some read operations have
to pay a constant cost to decode the row level column index, this can be
tuned though. AFAIK the comparator type has very little to do with the
performance.

In Twissandra, the columns are used as "alternative" index for the
Userline/Timeline. therefore the operation I'm going to do is slice_range.
I'm going to get (for example) the first 50  columns (using comparator of
TimeUUID/LONG).
Can you recommend on a better way of doing that or a way to tune Cassandra
to support those 2 CF?


Thanks!

On Sun, Jul 10, 2011 at 3:26 AM, aaron morton <aaron@thelastpickle.com>wrote:

>
>    1. Is there a limit on the number of columns in a single column family
>    that serve as secondary indexes?
>
> AFAIK there is no coded limit, however every index is implemented as
> another (hidden) Column Family that inherits the settings of the parent CF.
> So under 0.7 you may run out of memory, under 0.8 you may flush  a lot.
> Also, when an indexed column is updated there are potentially 3 operations
> that have to happen: read the old value, delete the old value, write the new
> value. More indexes == more index updating, just like any other database.
>
>
>    1. Does performance decrease (significantly) if the uniqueness of the
>    column’s values is high?
>
> Low cardinality is recommended
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html
>
>
>    1. The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE"
>    and not TimeUUID?
>
> Probably just to make the demo easier. It's used to order tweets in the
> user and public timelines by the current time
> https://github.com/twissandra/twissandra/blob/master/cass.py#L204
>
>
>    1. Does performance decrease (significantly) if the uniqueness of the
>    column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
>    lots of columns?
>
> Depends on what sort of operations you are doing. Some read operations have
> to pay a constant cost to decode the row level column index, this can be
> tuned though. AFAIK the comparator type has very little to do with the
> performance.
>
> Hope that helps.
>
> -----------------
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9 Jul 2011, at 12:15, Eldad Yamin wrote:
>
> Hi,
> I have few questions:
>
> *Secondary index*
>
>    1. Is there a limit on the number of columns in a single column family
>    that serve as secondary indexes?
>    2. Does performance decrease (significantly) if the uniqueness of the
>    column’s values is high?
>
>
> *Twissandra*
>
>    1. Why in the source (or any tutorial I've read):
>    The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE" and
>    not TimeUUID?
>
>    https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
>    2. Does performance decrease (significantly) if the uniqueness of the
>    column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
>    lots of columns?
>
>
> Thanks!
> Eldad
>
>
>

Mime
View raw message