Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 45891 invoked from network); 3 May 2010 02:50:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 02:50:03 -0000 Received: (qmail 22322 invoked by uid 500); 3 May 2010 02:50:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 22214 invoked by uid 500); 3 May 2010 02:50:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22206 invoked by uid 500); 3 May 2010 02:50:02 -0000 Delivered-To: apmail-incubator-cassandra-user@incubator.apache.org Received: (qmail 22203 invoked by uid 99); 3 May 2010 02:50:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 02:50:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jamesgolick@gmail.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-iw0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 02:49:55 +0000 Received: by iwn9 with SMTP id 9so2662858iwn.0 for ; Sun, 02 May 2010 19:49:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=VHYVTATIH5eVFGvOy9tTrCAMN6Yv7z+MfTMV3e6MQCs=; b=f22P6pHNAForlFeb5J8+qAboYZzkXBLVS+r0b02mSrxrcO3cXkUaDGhSSB8phyzMf/ iz0jAqJPSLmop1DJxRponGNuXvUEn4NBC/hvJ6S/ioN+yKPY7jUxqfXe+vFZbie24z48 xSHIwSrJum9bEwIXFa5yD3CJAIe/r1zinmXxM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=t32oHtdixwXozWPhyn7GKSVMLq1wSR4jVAd8/MgHXtbWS/YIZMzeUivguNOKSAP+4X ISR8jWXK7uEtTdrRKYFD9ReAGhcp1u0LnZoeNrHmhgQMHMNMMrk7+BpbpOCbooPNV8q1 IIYfwMmvIE8sCS9VToAvgi6uYMXvAwbs6iBZQ= MIME-Version: 1.0 Received: by 10.231.173.193 with SMTP id q1mr4167849ibz.88.1272854974939; Sun, 02 May 2010 19:49:34 -0700 (PDT) Received: by 10.231.206.76 with HTTP; Sun, 2 May 2010 19:49:34 -0700 (PDT) In-Reply-To: References: Date: Sun, 2 May 2010 19:49:34 -0700 Message-ID: Subject: Re: Row slice / cache performance From: James Golick To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=0016367b6388a1c14d0485a7a427 --0016367b6388a1c14d0485a7a427 Content-Type: text/plain; charset=ISO-8859-1 Just an update on this. I wrote a patch which attempts to solve this problem by keeping an index of columns that are marked for deletion to avoid having to iterate over the whole column set and call columns_.get() over and over again. My patch works, and the time spent in removeDeleted() is now close to zero. But, the performance doesn't seem to have noticeably improved. So, I'm not sure what I'm missing here. Either my test methodology is broken or I completely misread the profile. On Sun, May 2, 2010 at 11:01 AM, James Golick wrote: > Not sure why the first paragraph turned in to a numbered bullet... > > > On Sun, May 2, 2010 at 11:00 AM, James Golick wrote: > >> >> 1. I wrote the list a while back about less-than-great performance >> when reading thousands of columns even on cache hits. Last night, I decided >> to try to get to the bottom of why. >> >> >> I tested this by setting the row cache capacity on a TimeUUIDType-sorted >> CF to 10, filling up a single row with 2000 columns, and only running >> queries against that row. That row was the only thing in the database. I rm >> -Rf'd the data before starting the test. >> >> The tests were done from Coda Hale's scala client cassie, which is just a >> thin layer around the java thrift bindings. I didn't actually time each call >> because that wasn't the objective, but I didn't really need to. Reads of 10 >> columns felt quick enough, but 100 columns was slower. 1000 columns would >> frequently cause the client to timeout. The cache hit rate on that CF was >> 1.0, so, yes, the row was in cache. >> >> Doing a thousand reads with count=100 in a single thread pegged my >> macbook's CPU and caused the fans to spin up pretty loud. >> >> So, I attached a profiler and repeated the test. I'm no expert on >> cassandra internals, so please let me know if I'm way off here. The profiled >> reads were reversed=true, count=100. >> >> As far as I can tell, there are three components taking up most of the >> time on this type of read (row slice out of cache): >> >> 1. ColumnFamilystore.removeDeleted() @ ~40% - Most of the time in here >> is actually spent materializing UUID objects so that they can be compared in >> the ConcurrentSkipListMap (ColumnFamily.columns_). >> 2. SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the >> time in here is spent in ConcurrentSkipListMap$Values.toArrray() >> 3. QueryFilter.collectCollatedColumns @ ~30% - All the time being >> spent in ColumnFamily.addColumn, and about half of the total spent >> materializing UUIDs for comparison. >> >> This profile is consistent with the decrease in performance with higher >> values for count. If there are more UUIDs to deserialize, the performance of >> removeDeleted(), and collectCollatedColumns() should increase (roughly) >> linearly. >> >> So, my question at this point is how to fix it. I have some basic ideas, >> but being new to cassandra internals, I'm not sure they make any sense. Help >> me out here: >> >> 1. Optionally call removeDeleted() less often. I realize that this is >> probably a bad idea for a lot of reasons, but it was the first thing I >> thought of. >> 2. When a ColumnFamily object is put in to the row cache, copy the >> columns over to another data structure that doesn't need to be sorted on >> get(). If columns_ needs to be kept around, this option would have a memory >> impact, but at least for us, it'd be well worth it for the speed. >> 3. ???? >> >> I'd love to hear feedback on these / the rest of this (long) post. >> > > --0016367b6388a1c14d0485a7a427 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Just an update on this. I wrote a patch which attempts to solve this proble= m by keeping an index of columns that are marked for deletion to avoid havi= ng to iterate over the whole column set and call columns_.get() over and ov= er again.

My patch works, and the time spent in removeDeleted() is now= close to zero. But, the performance doesn't seem to have noticeably im= proved. So, I'm not sure what I'm missing here. Either my test meth= odology is broken or I completely misread the profile.

On Sun, May 2, 2010 at 11:01 AM, James = Golick <james= golick@gmail.com> wrote:
Not sure why the first paragraph turned in to a numbered bullet...


On Sun, May 2, = 2010 at 11:00 AM, James Golick <jamesgolick@gmail.com> w= rote:
  1. I wrote the list a while back about = less-than-great performance when reading thousands of columns even on cache= hits. Last night, I decided to try to get to the bottom of why.

I tested this by setting the row cache capacity on= a TimeUUIDType-sorted CF to 10, filling up a single row with 2000 columns,= and only running queries against that row. That row was the only thing in = the database. I rm -Rf'd the data before starting the test.

The tests were done from Coda Hale's scala client c= assie, which is just a thin layer around the java thrift bindings. I didn&#= 39;t actually time each call because that wasn't the objective, but I d= idn't really need to. Reads of 10 columns felt quick enough, but 100 co= lumns was slower. 1000 columns would frequently cause the client to timeout= . The cache hit rate on that CF was 1.0, so, yes, the row was in cache.

Doing a thousand reads with count=3D100 in a single thr= ead pegged my macbook's CPU and caused the fans to spin up pretty loud.=

So, I attached a profiler and repeated the test. = I'm no expert on cassandra internals, so please let me know if I'm = way off here. The profiled reads were reversed=3Dtrue, count=3D100.

As far as I can tell, there are three components taking= up most of the time on this type of read (row slice out of cache):
  1. ColumnFamilystore.removeDeleted() @ ~40% - Most of the time= in here is actually spent materializing UUID objects so that they can be c= ompared in the ConcurrentSkipListMap (ColumnFamily.columns_).
  2. SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the time i= n here is spent in ConcurrentSkipListMap$Values.toArrray()
  3. QueryFil= ter.collectCollatedColumns @ ~30% - All the time being spent in ColumnFamil= y.addColumn, and about half of the total spent materializing UUIDs for comp= arison.
This profile is consistent with the decrease in performance with = higher values for count. If there are more UUIDs to deserialize, the perfor= mance of removeDeleted(), and collectCollatedColumns() should increase (rou= ghly) linearly.

So, my question at this point is how to fix it. I have = some basic ideas, but being new to cassandra internals, I'm not sure th= ey make any sense. Help me out here:
  1. Optionally call remo= veDeleted() less often. I realize that this is probably a bad idea for a lo= t of reasons, but it was the first thing I thought of.
  2. When a ColumnFamily object is put in to the row cache, copy the columns= over to another data structure that doesn't need to be sorted on get()= . If columns_ needs to be kept around, this option would have a memory impa= ct, but at least for us, it'd be well worth it for the speed.
  3. ????
I'd love to hear feedback on these / the rest of= this (long) post.


--0016367b6388a1c14d0485a7a427--