Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of jamesgolick@gmail.com
 designates 209.85.223.179 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=kp3VABz3TpJ7YiybE7sfq0KxmfLaMZqf8gVZh1OYMsrX0KlU5Dw7ziU75qEIdxO6TP
         z1v/1Cj+JlvhWrWE0tBA/fbYGX2khmVD2Q5Vow/NKgsCUB0S9zqEFTtu51guJEZ6N/U+
         4t2irY5VxdbjJoi6xWOtWPaDqa465n1VvmX0o=
MIME-Version: 1.0
In-Reply-To: <m2t1ab2da821005021949j4b763e77gd60264e900648d7f@mail.gmail.com>
References: <h2k1ab2da821005021100l5302df57wa621fb2872e56357@mail.gmail.com>
	 <m2w1ab2da821005021101r2e134c51q2306d5e04b6b7358@mail.gmail.com>
	 <m2t1ab2da821005021949j4b763e77gd60264e900648d7f@mail.gmail.com>
Date: Sun, 2 May 2010 20:57:23 -0700
Message-ID: <h2r1ab2da821005022057p707721e5xa27785f851104d72@mail.gmail.com>
Subject: Re: Row slice / cache performance
From: James Golick <jamesgolick@gmail.com>
To: cassandra-user@incubator.apache.org
Content-Type: multipart/alternative; boundary=00504501436c27112c0485a897fd

--00504501436c27112c0485a897fd
Content-Type: text/plain; charset=ISO-8859-1

Got a ~50% improvement by making UUID comparison less heavy-weight.

https://issues.apache.org/jira/browse/CASSANDRA-1043

On Sun, May 2, 2010 at 7:49 PM, James Golick <jamesgolick@gmail.com> wrote:

> Just an update on this. I wrote a patch which attempts to solve this
> problem by keeping an index of columns that are marked for deletion to avoid
> having to iterate over the whole column set and call columns_.get() over and
> over again.
>
> My patch works, and the time spent in removeDeleted() is now close to zero.
> But, the performance doesn't seem to have noticeably improved. So, I'm not
> sure what I'm missing here. Either my test methodology is broken or I
> completely misread the profile.
>
> On Sun, May 2, 2010 at 11:01 AM, James Golick <jamesgolick@gmail.com>wrote:
>
>> Not sure why the first paragraph turned in to a numbered bullet...
>>
>>
>> On Sun, May 2, 2010 at 11:00 AM, James Golick <jamesgolick@gmail.com>wrote:
>>
>>>
>>>    1. I wrote the list a while back about less-than-great performance
>>>    when reading thousands of columns even on cache hits. Last night, I decided
>>>    to try to get to the bottom of why.
>>>
>>>
>>> I tested this by setting the row cache capacity on a TimeUUIDType-sorted
>>> CF to 10, filling up a single row with 2000 columns, and only running
>>> queries against that row. That row was the only thing in the database. I rm
>>> -Rf'd the data before starting the test.
>>>
>>> The tests were done from Coda Hale's scala client cassie, which is just a
>>> thin layer around the java thrift bindings. I didn't actually time each call
>>> because that wasn't the objective, but I didn't really need to. Reads of 10
>>> columns felt quick enough, but 100 columns was slower. 1000 columns would
>>> frequently cause the client to timeout. The cache hit rate on that CF was
>>> 1.0, so, yes, the row was in cache.
>>>
>>> Doing a thousand reads with count=100 in a single thread pegged my
>>> macbook's CPU and caused the fans to spin up pretty loud.
>>>
>>> So, I attached a profiler and repeated the test. I'm no expert on
>>> cassandra internals, so please let me know if I'm way off here. The profiled
>>> reads were reversed=true, count=100.
>>>
>>> As far as I can tell, there are three components taking up most of the
>>> time on this type of read (row slice out of cache):
>>>
>>>    1. ColumnFamilystore.removeDeleted() @ ~40% - Most of the time in
>>>    here is actually spent materializing UUID objects so that they can be
>>>    compared in the ConcurrentSkipListMap (ColumnFamily.columns_).
>>>    2. SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the
>>>    time in here is spent in ConcurrentSkipListMap$Values.toArrray()
>>>    3. QueryFilter.collectCollatedColumns @ ~30% - All the time being
>>>    spent in ColumnFamily.addColumn, and about half of the total spent
>>>    materializing UUIDs for comparison.
>>>
>>> This profile is consistent with the decrease in performance with higher
>>> values for count. If there are more UUIDs to deserialize, the performance of
>>> removeDeleted(), and collectCollatedColumns() should increase (roughly)
>>> linearly.
>>>
>>> So, my question at this point is how to fix it. I have some basic ideas,
>>> but being new to cassandra internals, I'm not sure they make any sense. Help
>>> me out here:
>>>
>>>    1. Optionally call removeDeleted() less often. I realize that this is
>>>    probably a bad idea for a lot of reasons, but it was the first thing I
>>>    thought of.
>>>    2. When a ColumnFamily object is put in to the row cache, copy the
>>>    columns over to another data structure that doesn't need to be sorted on
>>>    get(). If columns_ needs to be kept around, this option would have a memory
>>>    impact, but at least for us, it'd be well worth it for the speed.
>>>    3. ????
>>>
>>> I'd love to hear feedback on these / the rest of this (long) post.
>>>
>>
>>
>

--00504501436c27112c0485a897fd
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Got a ~50% improvement by making UUID comparison less heavy-weight.<div><br=
></div><div><a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-1043=
">https://issues.apache.org/jira/browse/CASSANDRA-1043</a><br><br><div clas=
s=3D"gmail_quote">
On Sun, May 2, 2010 at 7:49 PM, James Golick <span dir=3D"ltr">&lt;<a href=
=3D"mailto:jamesgolick@gmail.com">jamesgolick@gmail.com</a>&gt;</span> wrot=
e:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex;">
Just an update on this. I wrote a patch which attempts to solve this proble=
m by keeping an index of columns that are marked for deletion to avoid havi=
ng to iterate over the whole column set and call columns_.get() over and ov=
er again.<div>

<br></div><div>My patch works, and the time spent in removeDeleted() is now=
 close to zero. But, the performance doesn&#39;t seem to have noticeably im=
proved. So, I&#39;m not sure what I&#39;m missing here. Either my test meth=
odology is broken or I completely misread the profile.</div>
<div><div></div><div class=3D"h5">
<div><br><div class=3D"gmail_quote">On Sun, May 2, 2010 at 11:01 AM, James =
Golick <span dir=3D"ltr">&lt;<a href=3D"mailto:jamesgolick@gmail.com" targe=
t=3D"_blank">jamesgolick@gmail.com</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">

Not sure why the first paragraph turned in to a numbered bullet...<div><div=
></div><div><br><br><div class=3D"gmail_quote">On Sun, May 2, 2010 at 11:00=
 AM, James Golick <span dir=3D"ltr">&lt;<a href=3D"mailto:jamesgolick@gmail=
.com" target=3D"_blank">jamesgolick@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><ol><li>I wrote the list a while back about =
less-than-great performance when reading thousands of columns even on cache=
 hits. Last night, I decided to try to get to the bottom of why.</li>


</ol><div><br></div><div>I tested this by setting the row cache capacity on=
 a TimeUUIDType-sorted CF to 10, filling up a single row with 2000 columns,=
 and only running queries against that row. That row was the only thing in =
the database. I rm -Rf&#39;d the data before starting the test.</div>


<div><br></div><div>The tests were done from Coda Hale&#39;s scala client c=
assie, which is just a thin layer around the java thrift bindings. I didn&#=
39;t actually time each call because that wasn&#39;t the objective, but I d=
idn&#39;t really need to. Reads of 10 columns felt quick enough, but 100 co=
lumns was slower. 1000 columns would frequently cause the client to timeout=
. The cache hit rate on that CF was 1.0, so, yes, the row was in cache.</di=
v>


<div><br></div><div>Doing a thousand reads with count=3D100 in a single thr=
ead pegged my macbook&#39;s CPU and caused the fans to spin up pretty loud.=
</div><div><br></div><div>So, I attached a profiler and repeated the test. =
I&#39;m no expert on cassandra internals, so please let me know if I&#39;m =
way off here. The profiled reads were reversed=3Dtrue, count=3D100.</div>


<div><br></div><div>As far as I can tell, there are three components taking=
 up most of the time on this type of read (row slice out of cache):</div><d=
iv><div><ol><li>ColumnFamilystore.removeDeleted() @ ~40% - Most of the time=
 in here is actually spent materializing UUID objects so that they can be c=
ompared in the ConcurrentSkipListMap (ColumnFamily.columns_).</li>


<li>SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the time i=
n here is spent in ConcurrentSkipListMap$Values.toArrray()</li><li>QueryFil=
ter.collectCollatedColumns @ ~30% - All the time being spent in ColumnFamil=
y.addColumn, and about half of the total spent materializing UUIDs for comp=
arison.</li>


</ol><div>This profile is consistent with the decrease in performance with =
higher values for count. If there are more UUIDs to deserialize, the perfor=
mance of removeDeleted(), and collectCollatedColumns() should increase (rou=
ghly) linearly.</div>


<div><br></div><div>So, my question at this point is how to fix it. I have =
some basic ideas, but being new to cassandra internals, I&#39;m not sure th=
ey make any sense. Help me out here:</div><div><ol><li>Optionally call remo=
veDeleted() less often. I realize that this is probably a bad idea for a lo=
t of reasons, but it was the first thing I thought of.</li>


<li>When a ColumnFamily object is put in to the row cache, copy the columns=
 over to another data structure that doesn&#39;t need to be sorted on get()=
. If columns_ needs to be kept around, this option would have a memory impa=
ct, but at least for us, it&#39;d be well worth it for the speed.</li>


<li>????</li></ol><div>I&#39;d love to hear feedback on these / the rest of=
 this (long) post.</div></div></div></div>
</blockquote></div><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--00504501436c27112c0485a897fd--