Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 93506 invoked from network); 21 Apr 2010 01:03:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Apr 2010 01:03:11 -0000 Received: (qmail 93465 invoked by uid 500); 21 Apr 2010 01:03:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93442 invoked by uid 500); 21 Apr 2010 01:03:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93434 invoked by uid 99); 21 Apr 2010 01:03:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 01:03:11 +0000 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=AWL,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.221.192] (HELO mail-qy0-f192.google.com) (209.85.221.192) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 01:03:05 +0000 Received: by qyk30 with SMTP id 30so6177256qyk.16 for ; Tue, 20 Apr 2010 18:02:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.228.138 with HTTP; Tue, 20 Apr 2010 18:02:43 -0700 (PDT) In-Reply-To: References: Date: Tue, 20 Apr 2010 18:02:43 -0700 Received: by 10.229.222.12 with SMTP id ie12mr2890738qcb.77.1271811763995; Tue, 20 Apr 2010 18:02:43 -0700 (PDT) Message-ID: Subject: Re: How to increase cassandra's performance in read? From: Benjamin Black To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Apr 20, 2010 at 11:54 AM, Mark Jones wrote: > When I look at this arrangement, I see one lookup by key for the user, fo= llowed by a large read for all the "email indexes" =A0(these are all column= s in the same row, right?) > > Then one lookup by key for each email.... =A0Seems very seek intensive. > Do you need to grab every single email every single time? Seems to me you only need the recent ones or a page full. A single multiget would do it, and the load is spread across the cluster. >... > > > Ok, so If I do it this way, the # of keys rapidly goes into the billions,= does that not cause other problems? Not generally. Cassandra is built to handle enormous numbers of rows efficiently. >Seems like many more data/index files.... > Only if you aren't compacting for some reason. b