Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of skrolle@gmail.com designates
 209.85.218.222 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=dGQNEVcOEnxPgMgpecU5BvL+IbTiATwEkSHgsITV2k210XcsanlpYXmNBMz0pvumUz
         I2YIVqI6C3CXVErFH+iDMNDJ+E64BiZs61kyuhCdKXmVlAaJ+P/qRh/xvv6oVPR4aE2B
         cXLwsEFeZOV6P5dpV3yLDP5VAN1cV3PVpb9EA=
MIME-Version: 1.0
In-Reply-To: <e06563881003251957k7479443esd5110cec7fb57a02@mail.gmail.com>
References: <cccfea211003250633u169395d3s4719da027d0c9058@mail.gmail.com>
	 <e06563881003251957k7479443esd5110cec7fb57a02@mail.gmail.com>
Date: Fri, 26 Mar 2010 13:40:48 +0100
Message-ID: <cccfea211003260540h7d42669bi30aacb888e89bfa4@mail.gmail.com>
Subject: Re: Range scan performance in 0.6.0 beta2
From: =?ISO-8859-1?Q?Henrik_Schr=F6der?= <skrolle@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=00151747bdd60b36280482b37928

--00151747bdd60b36280482b37928
Content-Type: text/plain; charset=ISO-8859-1

>
> So all the values for an entire index will be in one row?  That
> doesn't sound good.
>
> You really want to put each index [and each table] in its own CF, but
> until we can do that dynamically (0.7) you could at least make the
> index row keys a tuple of (indexid, indexvalue) and the column names
> in each row the object keys (empty column values).
>
> This works pretty well for a lot of users, including Digg.
>

We tested your suggestions like this:
We're using the OrderPreservingPartitioner.
We set the keycache and rowcache to 40%.
We're using the same machine as before, but we switched to a 64-bit JVM and
gave it 5GB of memory
For each indexvalue we insert a row where the key is indexid + ":" +
indexvalue encoded as hex string, and the row contains only one column,
where the name is the object key encoded as a bytearray, and the value is
empty.
When reading, we do a get_range_slice with an empty slice_range (start and
finish are 0-length byte-arrays), and randomly generated start_key and
finish_key where we know they both have been inserted, and finally a
row_count of 1000.

These are the numbers we got this time:
inserts (15 threads, batches of 10): 4000/second
get_range_slices (10 threads, row_count 1000): 50/seconds at start, down to
10/second at 250k inserts.

These numbers are slightly better than our previous OPP tries, but nothing
significant. For what it's worth, if we're only doing writes, the machine
bottlenecks on disk I/O as expected, but whenever we do reads, it
bottlenecks on CPU usage instead. Is this expected?


Also, how would dynamic column families help us? In our tests, we only
tested a single "index", so even if we had one column family per "index", we
would still only write to one of them and then get the exact same results as
above, right?

We're really grateful for any help with both how to tune Cassandra and how
to design our data model. The designs we've tested so far is the best we
could come up with ourselves, all we really need is a way to store groups of
mappings of indexvalue->objectkey, and be able to get a range of objectkeys
back given a group and a start and stop indexvalue.


/Henrik

--00151747bdd60b36280482b37928
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable


<div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margi=
n: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-le=
ft: 1ex;">So all the values for an entire index will be in one row? =A0That=
<br>

doesn&#39;t sound good.<br>
<br>
You really want to put each index [and each table] in its own CF, but<br>
until we can do that dynamically (0.7) you could at least make the<br>
index row keys a tuple of (indexid, indexvalue) and the column names<br>
in each row the object keys (empty column values).<br>
<br>
This works pretty well for a lot of users, including Digg.<br></blockquote>=
<div><br>We tested your suggestions like this:<br>We&#39;re using the Order=
PreservingPartitioner.<br>We set the keycache and rowcache to 40%.<br>
We&#39;re using the same machine as before, but  we switched to a 64-bit JV=
M and gave it 5GB of memory<br>For each indexvalue we insert a row where th=
e key is indexid + &quot;:&quot; + indexvalue encoded as hex string, and th=
e row contains only one column, where the name is the object key encoded as=
 a bytearray, and the value is empty.<br>
When reading, we do a get_range_slice with an empty slice_range (start and =
finish are 0-length byte-arrays), and randomly generated start_key and fini=
sh_key where we know they both have been inserted, and finally a row_count =
of 1000.<br>
<br>These are the numbers we got this time:<br>inserts (15 threads, batches=
 of 10): 4000/second<br>get_range_slices (10 threads, row_count 1000): 50/s=
econds at start, down to 10/second at 250k inserts.<br><br>These numbers ar=
e slightly better than our previous OPP tries, but nothing significant. For=
 what it&#39;s worth, if we&#39;re only doing writes, the machine bottlenec=
ks on disk I/O as expected, but whenever we do reads, it bottlenecks on CPU=
 usage instead. Is this expected?<br>
<br><br>Also, how would dynamic column families help us? In our tests, we o=
nly tested a single &quot;index&quot;, so even if we had one column family =
per &quot;index&quot;, we would still only write to one of them and then ge=
t the exact same results as above, right?<br>
<br>We&#39;re really grateful for any help with both how to tune Cassandra =
and how to design our data model. The designs we&#39;ve tested so far is th=
e best we could come up with ourselves, all we really need is a way to stor=
e groups of mappings of indexvalue-&gt;objectkey, and be able to get a rang=
e of objectkeys back given a group and a start and stop indexvalue.<br>
<br><br>/Henrik<br></div></div>

--00151747bdd60b36280482b37928--