Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 16783 invoked from network); 24 Jan 2011 08:27:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Jan 2011 08:27:00 -0000 Received: (qmail 97221 invoked by uid 500); 24 Jan 2011 08:26:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97077 invoked by uid 500); 24 Jan 2011 08:26:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97069 invoked by uid 99); 24 Jan 2011 08:26:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jan 2011 08:26:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jan 2011 08:26:45 +0000 Received: by gyd12 with SMTP id 12so1395045gyd.31 for ; Mon, 24 Jan 2011 00:26:22 -0800 (PST) MIME-Version: 1.0 Received: by 10.150.137.9 with SMTP id k9mr823343ybd.56.1295857582162; Mon, 24 Jan 2011 00:26:22 -0800 (PST) Sender: scode@scode.org Received: by 10.150.157.14 with HTTP; Mon, 24 Jan 2011 00:26:22 -0800 (PST) X-Originating-IP: [95.193.224.242] In-Reply-To: References: Date: Mon, 24 Jan 2011 09:26:22 +0100 X-Google-Sender-Auth: QWwf-1L22sBYO0Llsqp6mAPB4Wc Message-ID: Subject: Re: Does Cassandra support range queries on keys ? From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > Following your suggestions, of using key of super column as range token > won't I have a storage problem? You won't get me to proclaim that you won't have a storage problem ;) If you're going to deploy this at scale, I'm sure you'll have problems whatever you do... > I couldn't find information about this so I'll just ask: If I have a > (Super/)ColumnFamily that contains 1 "key" for the row but that row conta= ins > millions of k:v entries. Would that be a efficient=C2=A0Cassandra=C2=A0de= sign? > Does cassandra store a CF row on a single now or can it / should it > distribute this data? > Does having millions of k:v entries in a single row of a CF would be > considered a good practice? (in terms of query time, range scans and co ?= ) The replication set/distribution is on a per-row basis, so you generally don't want individual rows to be a significant part of the entire data set. You definitely don't want super columns that are huge; individual super column's columns aren't indexed on disk, for one thing. Having large rows with lots of columns... maybe. In general it's certainly supported, but the overall impact if you're intended to have relatively few rows all being very large - I don't want to say too much here. Anyone else? (anti-entropy granularity, compaction in-memory thresholds and GC tweaking, etc) --=20 / Peter Schuller