incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Burroughs <chris.burrou...@gmail.com>
Subject Re: Number of rows under one partition key
Date Wed, 04 Jun 2014 19:39:35 GMT
https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

Although by the simplistic version count hueirstic the sheer quantity of 
releases in the 2.0.x line would now satisfy the constraint.

On 05/29/2014 08:08 PM, Paulo Ricardo Motta Gomes wrote:
> Hey,
>
> We are considering upgrading from 1.2 to 2.0, why don't you consider 2.0
> ready for production yet, Robert? Have you wrote about this somewhere
> already?
>
> A bit off-topic in this discussion but it would be interesting to know,
> your posts are generally very enlightening.
>
> Cheers,
>
>
> On Thu, May 29, 2014 at 8:51 PM, Robert Coli <rcoli@eventbrite.com> wrote:
>
>> On Thu, May 15, 2014 at 6:10 AM, Vegard Berget <post@fantasista.no> wrote:
>>
>>> I know this has been discussed before, and I know there are limitations
>>> to how many rows one partition key in practice can handle.  But I am not
>>> sure if number of rows or total data is the deciding factor.
>>>
>>
>> Both. In terms of data size, partitions containing over a small number of
>> hundreds of Megabytes begin to see diminishing returns in some cases.
>> Partitions over 64 megabytes are compacted on disk, which should give you a
>> rough sense of what Cassandra considers a "large" partition.
>>
>>
>>> Should we add another partition key to avoid 1 000 000 rows in the same
>>> thrift-row (which is how I understand it is actually stored)?  Or is 1 000
>>> 000 rows okay?
>>>
>>
>> Depending on row size and access patterns, 1Mn rows is not extremely
>> large. There are, however, some row sizes and operations where this order
>> of magnitude of columns might be slow.
>>
>>
>>> Other considerations, for example compaction strategy and if we should do
>>> an upgrade to 2.0 because of this (we will upgrade anyway, but if it is
>>> recommended we will continue to use 2.0 in development and upgrade the
>>> production environment sooner)
>>>
>>
>> You should not upgrade to 2.0 in order to address this concern. You should
>> upgrade to 2.0 when it is stable enough to run in production, which IMO is
>> not yet. YMMV.
>>
>>
>>> I have done some testing, inserting a million rows and selecting them
>>> all, counting them and selecting individual rows (with both clientid and
>>> id) and it seems fine, but I want to ask to be sure that I am on the right
>>> track.
>>>
>>
>> If the access patterns you are using perform the way you would like with
>> representative size data, sounds reasonable to me?
>>
>> If you are able to select all million rows within a reasonable percentage
>> of the relevant timeout, I presume they cannot be too huge in terms of data
>> size! :D
>>
>> =Rob
>>
>
>
>


Mime
View raw message