Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 86051 invoked from network); 9 Mar 2010 20:14:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Mar 2010 20:14:47 -0000 Received: (qmail 81209 invoked by uid 500); 9 Mar 2010 20:14:18 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 81176 invoked by uid 500); 9 Mar 2010 20:14:18 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 81168 invoked by uid 99); 9 Mar 2010 20:14:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Mar 2010 20:14:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of driftx@gmail.com designates 209.85.221.193 as permitted sender) Received: from [209.85.221.193] (HELO mail-qy0-f193.google.com) (209.85.221.193) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Mar 2010 20:14:12 +0000 Received: by qyk31 with SMTP id 31so6717267qyk.8 for ; Tue, 09 Mar 2010 12:13:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=JSpIrJZDwE9x3SgiQeJctE852jmJDR2vLUPSIf8Afns=; b=COIicyLNh5yN25Ftua8w3ksatbzwjU191m0IrhT7PQWLNBT6LzSzB47EVbzI/i2K0F +BpyzJ2rSah7UYgmG7fPXX+LhwQMmdQolh/Tg7Xld1V7YCngUa6ovDbIeOjBRYgFZzkU pYGD55O0i3vLpYlWo6rC3cAetJ8/aaIXZaL+w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=CcX+qGvyZpUIGKSGgElYWSzBB06IYjiBgMDvh+ix9rZkcnlNRgbtjlrdXcSe+DpOpL 0PnelRJXoUV2oXVoVd+f2EYZN08F8LY1lbag5zVz0ZH3QzxBV14sg5BFWn2X7Co+LXRw mn/Xz4LlWG5pFagBzBKHzVFEcDk70HV8oQksU= MIME-Version: 1.0 Received: by 10.229.217.208 with SMTP id hn16mr375360qcb.107.1268165630784; Tue, 09 Mar 2010 12:13:50 -0800 (PST) In-Reply-To: References: Date: Tue, 9 Mar 2010 14:13:50 -0600 Message-ID: Subject: Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns' From: Brandon Williams To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=0016361e7da8f074d4048163d1e6 X-Virus-Checked: Checked by ClamAV on apache.org --0016361e7da8f074d4048163d1e6 Content-Type: text/plain; charset=ISO-8859-1 On Tue, Mar 9, 2010 at 1:14 PM, Sylvain Lebresne wrote: > I've inserted 1000 row of 100 column each (python stress.py -t 2 -n > 1000 -c 100 -i 5) > If I read, I get the roughly the same number of row whether I read the > whole row > (python stress.py -t 10 -n 1000 -o read -r -c 100) or only the first column > (python stress.py -t 10 -n 1000 -o read -r -c 1). And that's less that > 10 rows by > seconds. > > So sure, when I read the whole row, that almost 1000 columns by > seconds, which is > roughly 50M/s troughput, which is quite good. But when I read only the > first column, > I get 10 columns by seconds, that 500K/s, which is less good. Now, > from what I've > understood so far, cassandra doesn't deserialize whole row to read a > single column > (I'm not using supercolumn here), so I don't understand those numbers. > A row causes a disk seek while columns are contiguous. So if the row isn't in the cache, you're being impaired by the seeks. In general, fatter rows should be more performant than skinny ones. -Brandon --0016361e7da8f074d4048163d1e6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Tue, Mar 9, 2010 at 1:14 PM, Sylvain Lebresne= <sylvain@yakaz.com> wrote: