cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Ricardo Motta Gomes <paulo.mo...@chaordicsystems.com>
Subject Re: Number of rows under one partition key
Date Fri, 30 May 2014 00:08:48 GMT
Hey,

We are considering upgrading from 1.2 to 2.0, why don't you consider 2.0
ready for production yet, Robert? Have you wrote about this somewhere
already?

A bit off-topic in this discussion but it would be interesting to know,
your posts are generally very enlightening.

Cheers,


On Thu, May 29, 2014 at 8:51 PM, Robert Coli <rcoli@eventbrite.com> wrote:

> On Thu, May 15, 2014 at 6:10 AM, Vegard Berget <post@fantasista.no> wrote:
>
>> I know this has been discussed before, and I know there are limitations
>> to how many rows one partition key in practice can handle.  But I am not
>> sure if number of rows or total data is the deciding factor.
>>
>
> Both. In terms of data size, partitions containing over a small number of
> hundreds of Megabytes begin to see diminishing returns in some cases.
> Partitions over 64 megabytes are compacted on disk, which should give you a
> rough sense of what Cassandra considers a "large" partition.
>
>
>> Should we add another partition key to avoid 1 000 000 rows in the same
>> thrift-row (which is how I understand it is actually stored)?  Or is 1 000
>> 000 rows okay?
>>
>
> Depending on row size and access patterns, 1Mn rows is not extremely
> large. There are, however, some row sizes and operations where this order
> of magnitude of columns might be slow.
>
>
>> Other considerations, for example compaction strategy and if we should do
>> an upgrade to 2.0 because of this (we will upgrade anyway, but if it is
>> recommended we will continue to use 2.0 in development and upgrade the
>> production environment sooner)
>>
>
> You should not upgrade to 2.0 in order to address this concern. You should
> upgrade to 2.0 when it is stable enough to run in production, which IMO is
> not yet. YMMV.
>
>
>> I have done some testing, inserting a million rows and selecting them
>> all, counting them and selecting individual rows (with both clientid and
>> id) and it seems fine, but I want to ask to be sure that I am on the right
>> track.
>>
>
> If the access patterns you are using perform the way you would like with
> representative size data, sounds reasonable to me?
>
> If you are able to select all million rows within a reasonable percentage
> of the relevant timeout, I presume they cannot be too huge in terms of data
> size! :D
>
> =Rob
>



-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Mime
View raw message