Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 7743 invoked from network); 26 Mar 2010 12:41:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Mar 2010 12:41:17 -0000 Received: (qmail 72548 invoked by uid 500); 26 Mar 2010 12:41:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72533 invoked by uid 500); 26 Mar 2010 12:41:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72525 invoked by uid 99); 26 Mar 2010 12:41:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Mar 2010 12:41:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of skrolle@gmail.com designates 209.85.218.222 as permitted sender) Received: from [209.85.218.222] (HELO mail-bw0-f222.google.com) (209.85.218.222) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Mar 2010 12:41:08 +0000 Received: by bwz22 with SMTP id 22so8289306bwz.25 for ; Fri, 26 Mar 2010 05:40:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=Q81XlPxTZ1Aj0HebgjcupkKv3Yz2pwnJHjrAbRJdgY4=; b=WLFGn3FE2AflvzDWmBKm09HodqyhfTrleYQgxNY8XoUZGtEMi3PGgIHszvsXONVcje w6d6aOVxVvLUpWte0mLByWM1e2hBFvwaFSmRBvE+t2xAiyv8kMIAlPfK574Q3+49jjBW 81f5PRl9Ow6qJ9eBBBCA+5GnC33volItNLadw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=dGQNEVcOEnxPgMgpecU5BvL+IbTiATwEkSHgsITV2k210XcsanlpYXmNBMz0pvumUz I2YIVqI6C3CXVErFH+iDMNDJ+E64BiZs61kyuhCdKXmVlAaJ+P/qRh/xvv6oVPR4aE2B cXLwsEFeZOV6P5dpV3yLDP5VAN1cV3PVpb9EA= MIME-Version: 1.0 Received: by 10.204.99.133 with HTTP; Fri, 26 Mar 2010 05:40:48 -0700 (PDT) In-Reply-To: References: Date: Fri, 26 Mar 2010 13:40:48 +0100 Received: by 10.204.144.150 with SMTP id z22mr990006bku.152.1269607248375; Fri, 26 Mar 2010 05:40:48 -0700 (PDT) Message-ID: Subject: Re: Range scan performance in 0.6.0 beta2 From: =?ISO-8859-1?Q?Henrik_Schr=F6der?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00151747bdd60b36280482b37928 X-Virus-Checked: Checked by ClamAV on apache.org --00151747bdd60b36280482b37928 Content-Type: text/plain; charset=ISO-8859-1 > > So all the values for an entire index will be in one row? That > doesn't sound good. > > You really want to put each index [and each table] in its own CF, but > until we can do that dynamically (0.7) you could at least make the > index row keys a tuple of (indexid, indexvalue) and the column names > in each row the object keys (empty column values). > > This works pretty well for a lot of users, including Digg. > We tested your suggestions like this: We're using the OrderPreservingPartitioner. We set the keycache and rowcache to 40%. We're using the same machine as before, but we switched to a 64-bit JVM and gave it 5GB of memory For each indexvalue we insert a row where the key is indexid + ":" + indexvalue encoded as hex string, and the row contains only one column, where the name is the object key encoded as a bytearray, and the value is empty. When reading, we do a get_range_slice with an empty slice_range (start and finish are 0-length byte-arrays), and randomly generated start_key and finish_key where we know they both have been inserted, and finally a row_count of 1000. These are the numbers we got this time: inserts (15 threads, batches of 10): 4000/second get_range_slices (10 threads, row_count 1000): 50/seconds at start, down to 10/second at 250k inserts. These numbers are slightly better than our previous OPP tries, but nothing significant. For what it's worth, if we're only doing writes, the machine bottlenecks on disk I/O as expected, but whenever we do reads, it bottlenecks on CPU usage instead. Is this expected? Also, how would dynamic column families help us? In our tests, we only tested a single "index", so even if we had one column family per "index", we would still only write to one of them and then get the exact same results as above, right? We're really grateful for any help with both how to tune Cassandra and how to design our data model. The designs we've tested so far is the best we could come up with ourselves, all we really need is a way to store groups of mappings of indexvalue->objectkey, and be able to get a range of objectkeys back given a group and a start and stop indexvalue. /Henrik --00151747bdd60b36280482b37928 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
So all the values for an entire index will be in one row? =A0That=
doesn't sound good.

You really want to put each index [and each table] in its own CF, but
until we can do that dynamically (0.7) you could at least make the
index row keys a tuple of (indexid, indexvalue) and the column names
in each row the object keys (empty column values).

This works pretty well for a lot of users, including Digg.
=

We tested your suggestions like this:
We're using the Order= PreservingPartitioner.
We set the keycache and rowcache to 40%.
We're using the same machine as before, but we switched to a 64-bit JV= M and gave it 5GB of memory
For each indexvalue we insert a row where th= e key is indexid + ":" + indexvalue encoded as hex string, and th= e row contains only one column, where the name is the object key encoded as= a bytearray, and the value is empty.
When reading, we do a get_range_slice with an empty slice_range (start and = finish are 0-length byte-arrays), and randomly generated start_key and fini= sh_key where we know they both have been inserted, and finally a row_count = of 1000.

These are the numbers we got this time:
inserts (15 threads, batches= of 10): 4000/second
get_range_slices (10 threads, row_count 1000): 50/s= econds at start, down to 10/second at 250k inserts.

These numbers ar= e slightly better than our previous OPP tries, but nothing significant. For= what it's worth, if we're only doing writes, the machine bottlenec= ks on disk I/O as expected, but whenever we do reads, it bottlenecks on CPU= usage instead. Is this expected?


Also, how would dynamic column families help us? In our tests, we o= nly tested a single "index", so even if we had one column family = per "index", we would still only write to one of them and then ge= t the exact same results as above, right?

We're really grateful for any help with both how to tune Cassandra = and how to design our data model. The designs we've tested so far is th= e best we could come up with ourselves, all we really need is a way to stor= e groups of mappings of indexvalue->objectkey, and be able to get a rang= e of objectkeys back given a group and a start and stop indexvalue.


/Henrik
--00151747bdd60b36280482b37928--