Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8282F104B9 for ; Wed, 4 Jun 2014 19:40:07 +0000 (UTC) Received: (qmail 97863 invoked by uid 500); 4 Jun 2014 19:40:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97827 invoked by uid 500); 4 Jun 2014 19:40:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97819 invoked by uid 99); 4 Jun 2014 19:40:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2014 19:40:04 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chris.burroughs@gmail.com designates 209.85.192.42 as permitted sender) Received: from [209.85.192.42] (HELO mail-qg0-f42.google.com) (209.85.192.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2014 19:40:01 +0000 Received: by mail-qg0-f42.google.com with SMTP id q107so16635589qgd.15 for ; Wed, 04 Jun 2014 12:39:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=HhYBfoAhYDhrrcBy0dNNzfN3Gqfg50hy8xn4TIomWZU=; b=nhFjW1z7yPQvZAw/MgncCMzYbLbCkMtnudEaEmSR1Z3DP4cLEvjqje9v24Rm2xuqfZ UefINqdgLvt1WnXH5kx4iG04uxf3lblIJVP9OoQ1vv1EKNMZy96BRxMyS126DY8/v8Mu owYIbdGuqNe5SU18WD5s/COA7QWBmYM4dowbJxPl+0IktedWUU0KL/hLeQUUIYNQmEak Td1mbvObG+NgJVxd8WfQfSVW5bomPJ/iqna0m33HJZxbaPtc6k98BQavHo6muuf0pAKK 5qWKUYJHCn2v9gLTRflad/Dld2im8G1svh5A25v/hFry51BfdHthCtyeAf59GigxepTB w+BA== X-Received: by 10.224.51.72 with SMTP id c8mr10150179qag.82.1401910777277; Wed, 04 Jun 2014 12:39:37 -0700 (PDT) Received: from [192.168.1.142] (208-58-66-240.c3-0.161-ubr1.lnh-161.md.cable.rcn.com. [208.58.66.240]) by mx.google.com with ESMTPSA id g10sm5594558qai.5.2014.06.04.12.39.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 04 Jun 2014 12:39:36 -0700 (PDT) Message-ID: <538F75F7.9000902@gmail.com> Date: Wed, 04 Jun 2014 15:39:35 -0400 From: Chris Burroughs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.5.0 MIME-Version: 1.0 To: user@cassandra.apache.org CC: Vegard Berget Subject: Re: Number of rows under one partition key References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ Although by the simplistic version count hueirstic the sheer quantity of releases in the 2.0.x line would now satisfy the constraint. On 05/29/2014 08:08 PM, Paulo Ricardo Motta Gomes wrote: > Hey, > > We are considering upgrading from 1.2 to 2.0, why don't you consider 2.0 > ready for production yet, Robert? Have you wrote about this somewhere > already? > > A bit off-topic in this discussion but it would be interesting to know, > your posts are generally very enlightening. > > Cheers, > > > On Thu, May 29, 2014 at 8:51 PM, Robert Coli wrote: > >> On Thu, May 15, 2014 at 6:10 AM, Vegard Berget wrote: >> >>> I know this has been discussed before, and I know there are limitations >>> to how many rows one partition key in practice can handle. But I am not >>> sure if number of rows or total data is the deciding factor. >>> >> >> Both. In terms of data size, partitions containing over a small number of >> hundreds of Megabytes begin to see diminishing returns in some cases. >> Partitions over 64 megabytes are compacted on disk, which should give you a >> rough sense of what Cassandra considers a "large" partition. >> >> >>> Should we add another partition key to avoid 1 000 000 rows in the same >>> thrift-row (which is how I understand it is actually stored)? Or is 1 000 >>> 000 rows okay? >>> >> >> Depending on row size and access patterns, 1Mn rows is not extremely >> large. There are, however, some row sizes and operations where this order >> of magnitude of columns might be slow. >> >> >>> Other considerations, for example compaction strategy and if we should do >>> an upgrade to 2.0 because of this (we will upgrade anyway, but if it is >>> recommended we will continue to use 2.0 in development and upgrade the >>> production environment sooner) >>> >> >> You should not upgrade to 2.0 in order to address this concern. You should >> upgrade to 2.0 when it is stable enough to run in production, which IMO is >> not yet. YMMV. >> >> >>> I have done some testing, inserting a million rows and selecting them >>> all, counting them and selecting individual rows (with both clientid and >>> id) and it seems fine, but I want to ask to be sure that I am on the right >>> track. >>> >> >> If the access patterns you are using perform the way you would like with >> representative size data, sounds reasonable to me? >> >> If you are able to select all million rows within a reasonable percentage >> of the relevant timeout, I presume they cannot be too huge in terms of data >> size! :D >> >> =Rob >> > > >