Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A0254538 for ; Wed, 25 May 2011 20:10:55 +0000 (UTC) Received: (qmail 61140 invoked by uid 500); 25 May 2011 20:10:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61112 invoked by uid 500); 25 May 2011 20:10:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61104 invoked by uid 99); 25 May 2011 20:10:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2011 20:10:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dan.kuebrich@gmail.com designates 209.85.212.175 as permitted sender) Received: from [209.85.212.175] (HELO mail-px0-f175.google.com) (209.85.212.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2011 20:10:46 +0000 Received: by pxi17 with SMTP id 17so10345pxi.34 for ; Wed, 25 May 2011 13:10:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=cftHr4NWVxwLrfmWz3jsk1SDBgozm/6nA1wZvXomLBE=; b=atemSndlzpbtCO40KOZCixQWCqdXufQ9pW88MbJPWUIQQowOAQXEw8ktHTI4Kbmfoy xyhW88VNryTGJByVxpAAMhQGyDIUBEFNt71SeQxEZNh1t6wA6FU/Ua8XqHJYhYSpHDUp mYaX+S8N9nhoEDybsqtFA164GpnIwLxtMkAQI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=S1Id+JEEHrJb2Ddsuz7X3VL7/jWDxttaIGwjQSUbAFN0RPQALHgNhz6BrmKuzkz5wm ZusDl7jakpuXmChvinN09jpskUFuIKyinKMJmhn3f7BD2A2+NMN0Fh+0khkC7jbCVjah sFhDsclS7XN/81fpSTjaSIxZZUN6mdX+VPmpE= Received: by 10.68.49.167 with SMTP id v7mr1976pbn.122.1306354225115; Wed, 25 May 2011 13:10:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.54.9 with HTTP; Wed, 25 May 2011 13:10:05 -0700 (PDT) In-Reply-To: References: <1306338195.4ddd2393d9015@itchen.qinetiq.com> From: Dan Kuebrich Date: Wed, 25 May 2011 16:10:05 -0400 Message-ID: Subject: Re: Priority queue in a single row - performance falls over time To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec544eeb089f1f304a41f4b0c X-Virus-Checked: Checked by ClamAV on apache.org --bcaec544eeb089f1f304a41f4b0c Content-Type: text/plain; charset=ISO-8859-1 It sounds like the problem is that the row is getting filled up with tombstones and becoming enormous? Another idea then, which might not be worth the added complexity, is to progressively use new rows. Depending on volume, this could mean having 5-minute-window rows, or 1 minute, or whatever works best. Read: Assuming you're not falling behind, you only need to query the row that the current time falls in and the one immediately prior. If you do fall behind, you'll have to walk backwards in buckets until you find them empty. Write: Write column to the bucket (row) that corresponds to the correct time window. Delete: Delete the column from the row it was read from. When all columns in the row are deleted the row can GC. Again, cassandra might not be the correct datastore. On Wed, May 25, 2011 at 3:56 PM, Jonathan Ellis wrote: > You're basically intentionally inflicting the worst case scenario on > the Cassandra storage engine: > http://wiki.apache.org/cassandra/DistributedDeletes > > You could play around with reducing gc_grace_seconds but a PQ with > "millions" of items is something you should probably just do in memory > these days. > > On Wed, May 25, 2011 at 10:43 AM, wrote: > > > > > > Hi all, > > > > I'm trying to implement a priority queue for holding a large number > (millions) > > of items that need to be processed in time order. My solution works - but > gets > > slower and slower until performance is unacceptable - even with a small > number > > of items. > > > > Each item essentially needs to be popped off the queue (some arbitrary > work is > > then done) and then the item is returned to the queue with a new > timestamp > > indicating when it should be processed again. We thus cycle through all > work > > items eventually, but some may come around more frequently than others. > > > > I am implementing this as a single Cassandra row, in a CF with a TimeUUID > > comparator. > > > > Each column name is a TimeUUID, with an arbitrary column value describing > the > > work item; the columns are thus sorted in time order. > > > > To pop items, I do a get() such as: > > > > cf.get(row_key, column_finish=now, column_start=yesterday, > column_count=1000) > > > > to get all the items at the head of the queue (if any) whose time exceeds > the > > current system time. > > > > For each item retrieved, I do a delete to remove the old column, then an > insert > > with a fresh TimeUUID column name (system time + arbitrary increment), > thus > > putting the item back somewhere in the queue (currently, the back of the > queue) > > > > I do a batch_mutate for all these deletes and inserts, with a queue size > of > > 2000. These are currently interleaved i.e. > delete1-insert1-delete2-insert2... > > > > This all appears to work correctly, but the performance starts at around > 8000 > > cycles/sec, falls to around 1800/sec over the first 250K cycles, and > continues > > to fall over time, down to about 150/sec, after a few million cycles. > This > > happens regardless of the overall size of the row (I have tried sizes > from 1000 > > to 100,000 items). My target performance is 1000 cycles/sec (but my data > store > > will need to handle other work concurrently). > > > > I am currently using just a single node running on localhost, using a > pycassa > > client. 4 core, 4GB machine, Fedora 14. > > > > Is this expected behaviour (is there just too much churn for a single row > to > > perform well), or am I doing something wrong? > > > > Would https://issues.apache.org/jira/browse/CASSANDRA-2583 in version > 0.8.1 fix > > this problem (I am using version 0.7.6)? > > > > Thanks! > > > > David. > > > > ---------------------------------------------------------------- > > This message was sent using IMP, the Internet Messaging Program. > > > > This email and any attachments to it may be confidential and are > > intended solely for the use of the individual to whom it is addressed. > > If you are not the intended recipient of this email, you must neither > > take any action based upon its contents, nor copy or show it to anyone. > > Please contact the sender if you believe you have received this email in > > error. QinetiQ may monitor email traffic data and also the content of > > email for the purposes of security. QinetiQ Limited (Registered in > > England & Wales: Company Number: 3796233) Registered office: Cody > Technology > > Park, Ively Road, Farnborough, Hampshire, GU14 0LX > http://www.qinetiq.com. > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > --bcaec544eeb089f1f304a41f4b0c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It sounds like the problem is that the row is getting filled up with tombst= ones and becoming enormous? =A0Another idea then, which might not be worth = the added complexity, is to progressively use new rows. =A0Depending on vol= ume, this could mean having 5-minute-window rows, or 1 minute, or whatever = works best.

Read: Assuming you're not falling behind, you only need = to query the row that the current time falls in and the one immediately pri= or. =A0If you do fall behind, you'll have to walk backwards in buckets = until you find them empty.
Write: Write column to the bucket (row) that corresponds to the correct tim= e window.
Delete: Delete the column from the row it was read from= . =A0When all columns in the row are deleted the row can GC.

Again, cassandra might not be the correct datastore.
<= br>
On Wed, May 25, 2011 at 3:56 PM, Jonathan Ell= is <jbellis@gmail= .com> wrote:
You're basically intentionally inflicti= ng the worst case scenario on
the Cassandra storage engine:
http://wiki.apache.org/cassandra/DistributedDeletes

You could play around with reducing gc_grace_seconds but a PQ with
"millions" of items is something you should probably just do in m= emory
these days.

On Wed, May 25, 2011 at 10:43 AM, =A0<dnallsopp@taz.qinetiq.com> wrote:
>
>
> Hi all,
>
> I'm trying to implement a priority queue for holding a large numbe= r (millions)
> of items that need to be processed in time order. My solution works - = but gets
> slower and slower until performance is unacceptable - even with a smal= l number
> of items.
>
> Each item essentially needs to be popped off the queue (some arbitrary= work is
> then done) and then the item is returned to the queue with a new times= tamp
> indicating when it should be processed again. We thus cycle through al= l work
> items eventually, but some may come around more frequently than others= .
>
> I am implementing this as a single Cassandra row, in a CF with a TimeU= UID
> comparator.
>
> Each column name is a TimeUUID, with an arbitrary column value describ= ing the
> work item; the columns are thus sorted in time order.
>
> To pop items, I do a get() such as:
>
> =A0cf.get(row_key, column_finish=3Dnow, column_start=3Dyesterday, colu= mn_count=3D1000)
>
> to get all the items at the head of the queue (if any) whose time exce= eds the
> current system time.
>
> For each item retrieved, I do a delete to remove the old column, then = an insert
> with a fresh TimeUUID column name (system time + arbitrary increment),= thus
> putting the item back somewhere in the queue (currently, the back of t= he queue)
>
> I do a batch_mutate for all these deletes and inserts, with a queue si= ze of
> 2000. These are currently interleaved i.e. delete1-insert1-delete2-ins= ert2...
>
> This all appears to work correctly, but the performance starts at arou= nd 8000
> cycles/sec, falls to around 1800/sec over the first 250K cycles, and c= ontinues
> to fall over time, down to about 150/sec, after a few million cycles. = This
> happens regardless of the overall size of the row (I have tried sizes = from 1000
> to 100,000 items). My target performance is 1000 cycles/sec (but my da= ta store
> will need to handle other work concurrently).
>
> I am currently using just a single node running on localhost, using a = pycassa
> client. 4 core, 4GB machine, Fedora 14.
>
> Is this expected behaviour (is there just too much churn for a single = row to
> perform well), or am I doing something wrong?
>
> Would https://issues.apache.org/jira/browse/CASSANDRA-2583= in version 0.8.1 fix
> this problem (I am using version 0.7.6)?
>
> Thanks!
>
> David.
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.=
> If you are not the intended recipient of this email, you must neither<= br> > take any action based upon its contents, nor copy or show it to anyone= .
> Please contact the sender if you believe you have received this email = in
> error. QinetiQ may monitor email traffic data and also the content of<= br> > email for the purposes of security. QinetiQ Limited (Registered in
> England & Wales: Company Number: 3796233) Registered office: Cody = Technology
> Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.c= om

--bcaec544eeb089f1f304a41f4b0c--