Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 60646 invoked from network); 3 May 2010 03:57:55 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 03:57:55 -0000 Received: (qmail 43448 invoked by uid 500); 3 May 2010 03:57:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43395 invoked by uid 500); 3 May 2010 03:57:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43387 invoked by uid 500); 3 May 2010 03:57:53 -0000 Delivered-To: apmail-incubator-cassandra-user@incubator.apache.org Received: (qmail 43384 invoked by uid 99); 3 May 2010 03:57:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 03:57:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jamesgolick@gmail.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-iw0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 03:57:45 +0000 Received: by iwn9 with SMTP id 9so2712485iwn.0 for ; Sun, 02 May 2010 20:57:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=1KI/u6uWXkuITnVHaJnpDD1ZawduPsKJ+numNFRCJzs=; b=r5116QoQ0Ggy3I19rsV+7w3UD/VEDv1fshaHbuZ31TgonGOXbZU8rIxdfAjdy9T8Je 0NllPR92hdmXfdGyMPNcjlTAnrPGkcSFTtXLKdSM4IZuH0zBQbAyHSUGC1cDY33en1Og xj0Go8ljaCdMTqNNul+/K2W/yuRsFxyzPnX6E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=kp3VABz3TpJ7YiybE7sfq0KxmfLaMZqf8gVZh1OYMsrX0KlU5Dw7ziU75qEIdxO6TP z1v/1Cj+JlvhWrWE0tBA/fbYGX2khmVD2Q5Vow/NKgsCUB0S9zqEFTtu51guJEZ6N/U+ 4t2irY5VxdbjJoi6xWOtWPaDqa465n1VvmX0o= MIME-Version: 1.0 Received: by 10.231.166.8 with SMTP id k8mr145015iby.93.1272859043762; Sun, 02 May 2010 20:57:23 -0700 (PDT) Received: by 10.231.206.76 with HTTP; Sun, 2 May 2010 20:57:23 -0700 (PDT) In-Reply-To: References: Date: Sun, 2 May 2010 20:57:23 -0700 Message-ID: Subject: Re: Row slice / cache performance From: James Golick To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=00504501436c27112c0485a897fd X-Virus-Checked: Checked by ClamAV on apache.org --00504501436c27112c0485a897fd Content-Type: text/plain; charset=ISO-8859-1 Got a ~50% improvement by making UUID comparison less heavy-weight. https://issues.apache.org/jira/browse/CASSANDRA-1043 On Sun, May 2, 2010 at 7:49 PM, James Golick wrote: > Just an update on this. I wrote a patch which attempts to solve this > problem by keeping an index of columns that are marked for deletion to avoid > having to iterate over the whole column set and call columns_.get() over and > over again. > > My patch works, and the time spent in removeDeleted() is now close to zero. > But, the performance doesn't seem to have noticeably improved. So, I'm not > sure what I'm missing here. Either my test methodology is broken or I > completely misread the profile. > > On Sun, May 2, 2010 at 11:01 AM, James Golick wrote: > >> Not sure why the first paragraph turned in to a numbered bullet... >> >> >> On Sun, May 2, 2010 at 11:00 AM, James Golick wrote: >> >>> >>> 1. I wrote the list a while back about less-than-great performance >>> when reading thousands of columns even on cache hits. Last night, I decided >>> to try to get to the bottom of why. >>> >>> >>> I tested this by setting the row cache capacity on a TimeUUIDType-sorted >>> CF to 10, filling up a single row with 2000 columns, and only running >>> queries against that row. That row was the only thing in the database. I rm >>> -Rf'd the data before starting the test. >>> >>> The tests were done from Coda Hale's scala client cassie, which is just a >>> thin layer around the java thrift bindings. I didn't actually time each call >>> because that wasn't the objective, but I didn't really need to. Reads of 10 >>> columns felt quick enough, but 100 columns was slower. 1000 columns would >>> frequently cause the client to timeout. The cache hit rate on that CF was >>> 1.0, so, yes, the row was in cache. >>> >>> Doing a thousand reads with count=100 in a single thread pegged my >>> macbook's CPU and caused the fans to spin up pretty loud. >>> >>> So, I attached a profiler and repeated the test. I'm no expert on >>> cassandra internals, so please let me know if I'm way off here. The profiled >>> reads were reversed=true, count=100. >>> >>> As far as I can tell, there are three components taking up most of the >>> time on this type of read (row slice out of cache): >>> >>> 1. ColumnFamilystore.removeDeleted() @ ~40% - Most of the time in >>> here is actually spent materializing UUID objects so that they can be >>> compared in the ConcurrentSkipListMap (ColumnFamily.columns_). >>> 2. SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the >>> time in here is spent in ConcurrentSkipListMap$Values.toArrray() >>> 3. QueryFilter.collectCollatedColumns @ ~30% - All the time being >>> spent in ColumnFamily.addColumn, and about half of the total spent >>> materializing UUIDs for comparison. >>> >>> This profile is consistent with the decrease in performance with higher >>> values for count. If there are more UUIDs to deserialize, the performance of >>> removeDeleted(), and collectCollatedColumns() should increase (roughly) >>> linearly. >>> >>> So, my question at this point is how to fix it. I have some basic ideas, >>> but being new to cassandra internals, I'm not sure they make any sense. Help >>> me out here: >>> >>> 1. Optionally call removeDeleted() less often. I realize that this is >>> probably a bad idea for a lot of reasons, but it was the first thing I >>> thought of. >>> 2. When a ColumnFamily object is put in to the row cache, copy the >>> columns over to another data structure that doesn't need to be sorted on >>> get(). If columns_ needs to be kept around, this option would have a memory >>> impact, but at least for us, it'd be well worth it for the speed. >>> 3. ???? >>> >>> I'd love to hear feedback on these / the rest of this (long) post. >>> >> >> > --00504501436c27112c0485a897fd Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Got a ~50% improvement by making UUID comparison less heavy-weight.
https://issues.apache.org/jira/browse/CASSANDRA-1043

On Sun, May 2, 2010 at 7:49 PM, James Golick <jamesgolick@gmail.com> wrot= e:
Just an update on this. I wrote a patch which attempts to solve this proble= m by keeping an index of columns that are marked for deletion to avoid havi= ng to iterate over the whole column set and call columns_.get() over and ov= er again.

My patch works, and the time spent in removeDeleted() is now= close to zero. But, the performance doesn't seem to have noticeably im= proved. So, I'm not sure what I'm missing here. Either my test meth= odology is broken or I completely misread the profile.

On Sun, May 2, 2010 at 11:01 AM, James = Golick <jamesgolick@gmail.com> wrote:
Not sure why the first paragraph turned in to a numbered bullet...


On Sun, May 2, 2010 at 11:00= AM, James Golick <jamesgolick@gmail.com> wrote:
  1. I wrote the list a while back about = less-than-great performance when reading thousands of columns even on cache= hits. Last night, I decided to try to get to the bottom of why.

I tested this by setting the row cache capacity on= a TimeUUIDType-sorted CF to 10, filling up a single row with 2000 columns,= and only running queries against that row. That row was the only thing in = the database. I rm -Rf'd the data before starting the test.

The tests were done from Coda Hale's scala client c= assie, which is just a thin layer around the java thrift bindings. I didn&#= 39;t actually time each call because that wasn't the objective, but I d= idn't really need to. Reads of 10 columns felt quick enough, but 100 co= lumns was slower. 1000 columns would frequently cause the client to timeout= . The cache hit rate on that CF was 1.0, so, yes, the row was in cache.

Doing a thousand reads with count=3D100 in a single thr= ead pegged my macbook's CPU and caused the fans to spin up pretty loud.=

So, I attached a profiler and repeated the test. = I'm no expert on cassandra internals, so please let me know if I'm = way off here. The profiled reads were reversed=3Dtrue, count=3D100.

As far as I can tell, there are three components taking= up most of the time on this type of read (row slice out of cache):
  1. ColumnFamilystore.removeDeleted() @ ~40% - Most of the time= in here is actually spent materializing UUID objects so that they can be c= ompared in the ConcurrentSkipListMap (ColumnFamily.columns_).
  2. SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the time i= n here is spent in ConcurrentSkipListMap$Values.toArrray()
  3. QueryFil= ter.collectCollatedColumns @ ~30% - All the time being spent in ColumnFamil= y.addColumn, and about half of the total spent materializing UUIDs for comp= arison.
This profile is consistent with the decrease in performance with = higher values for count. If there are more UUIDs to deserialize, the perfor= mance of removeDeleted(), and collectCollatedColumns() should increase (rou= ghly) linearly.

So, my question at this point is how to fix it. I have = some basic ideas, but being new to cassandra internals, I'm not sure th= ey make any sense. Help me out here:
  1. Optionally call remo= veDeleted() less often. I realize that this is probably a bad idea for a lo= t of reasons, but it was the first thing I thought of.
  2. When a ColumnFamily object is put in to the row cache, copy the columns= over to another data structure that doesn't need to be sorted on get()= . If columns_ needs to be kept around, this option would have a memory impa= ct, but at least for us, it'd be well worth it for the speed.
  3. ????
I'd love to hear feedback on these / the rest of= this (long) post.



--00504501436c27112c0485a897fd--