Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of dan.kuebrich@gmail.com
 designates 209.85.212.175 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=S1Id+JEEHrJb2Ddsuz7X3VL7/jWDxttaIGwjQSUbAFN0RPQALHgNhz6BrmKuzkz5wm
         ZusDl7jakpuXmChvinN09jpskUFuIKyinKMJmhn3f7BD2A2+NMN0Fh+0khkC7jbCVjah
         sFhDsclS7XN/81fpSTjaSIxZZUN6mdX+VPmpE=
MIME-Version: 1.0
In-Reply-To: <BANLkTintG_TvP0d-hSVHX2UoRvKygxi90Q@mail.gmail.com>
References: <1306338195.4ddd2393d9015@itchen.qinetiq.com>
 <BANLkTintG_TvP0d-hSVHX2UoRvKygxi90Q@mail.gmail.com>
From: Dan Kuebrich <dan.kuebrich@gmail.com>
Date: Wed, 25 May 2011 16:10:05 -0400
Message-ID: <BANLkTin-AA3Pnpe3xoK+fcLc9328Cfm1Mw@mail.gmail.com>
Subject: Re: Priority queue in a single row - performance falls over time
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=bcaec544eeb089f1f304a41f4b0c

--bcaec544eeb089f1f304a41f4b0c
Content-Type: text/plain; charset=ISO-8859-1

It sounds like the problem is that the row is getting filled up with
tombstones and becoming enormous?  Another idea then, which might not be
worth the added complexity, is to progressively use new rows.  Depending on
volume, this could mean having 5-minute-window rows, or 1 minute, or
whatever works best.

Read: Assuming you're not falling behind, you only need to query the row
that the current time falls in and the one immediately prior.  If you do
fall behind, you'll have to walk backwards in buckets until you find them
empty.
Write: Write column to the bucket (row) that corresponds to the correct time
window.
Delete: Delete the column from the row it was read from.  When all columns
in the row are deleted the row can GC.

Again, cassandra might not be the correct datastore.

On Wed, May 25, 2011 at 3:56 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> You're basically intentionally inflicting the worst case scenario on
> the Cassandra storage engine:
> http://wiki.apache.org/cassandra/DistributedDeletes
>
> You could play around with reducing gc_grace_seconds but a PQ with
> "millions" of items is something you should probably just do in memory
> these days.
>
> On Wed, May 25, 2011 at 10:43 AM,  <dnallsopp@taz.qinetiq.com> wrote:
> >
> >
> > Hi all,
> >
> > I'm trying to implement a priority queue for holding a large number
> (millions)
> > of items that need to be processed in time order. My solution works - but
> gets
> > slower and slower until performance is unacceptable - even with a small
> number
> > of items.
> >
> > Each item essentially needs to be popped off the queue (some arbitrary
> work is
> > then done) and then the item is returned to the queue with a new
> timestamp
> > indicating when it should be processed again. We thus cycle through all
> work
> > items eventually, but some may come around more frequently than others.
> >
> > I am implementing this as a single Cassandra row, in a CF with a TimeUUID
> > comparator.
> >
> > Each column name is a TimeUUID, with an arbitrary column value describing
> the
> > work item; the columns are thus sorted in time order.
> >
> > To pop items, I do a get() such as:
> >
> >  cf.get(row_key, column_finish=now, column_start=yesterday,
> column_count=1000)
> >
> > to get all the items at the head of the queue (if any) whose time exceeds
> the
> > current system time.
> >
> > For each item retrieved, I do a delete to remove the old column, then an
> insert
> > with a fresh TimeUUID column name (system time + arbitrary increment),
> thus
> > putting the item back somewhere in the queue (currently, the back of the
> queue)
> >
> > I do a batch_mutate for all these deletes and inserts, with a queue size
> of
> > 2000. These are currently interleaved i.e.
> delete1-insert1-delete2-insert2...
> >
> > This all appears to work correctly, but the performance starts at around
> 8000
> > cycles/sec, falls to around 1800/sec over the first 250K cycles, and
> continues
> > to fall over time, down to about 150/sec, after a few million cycles.
> This
> > happens regardless of the overall size of the row (I have tried sizes
> from 1000
> > to 100,000 items). My target performance is 1000 cycles/sec (but my data
> store
> > will need to handle other work concurrently).
> >
> > I am currently using just a single node running on localhost, using a
> pycassa
> > client. 4 core, 4GB machine, Fedora 14.
> >
> > Is this expected behaviour (is there just too much churn for a single row
> to
> > perform well), or am I doing something wrong?
> >
> > Would https://issues.apache.org/jira/browse/CASSANDRA-2583 in version
> 0.8.1 fix
> > this problem (I am using version 0.7.6)?
> >
> > Thanks!
> >
> > David.
> >
> > ----------------------------------------------------------------
> > This message was sent using IMP, the Internet Messaging Program.
> >
> > This email and any attachments to it may be confidential and are
> > intended solely for the use of the individual to whom it is addressed.
> > If you are not the intended recipient of this email, you must neither
> > take any action based upon its contents, nor copy or show it to anyone.
> > Please contact the sender if you believe you have received this email in
> > error. QinetiQ may monitor email traffic data and also the content of
> > email for the purposes of security. QinetiQ Limited (Registered in
> > England & Wales: Company Number: 3796233) Registered office: Cody
> Technology
> > Park, Ively Road, Farnborough, Hampshire, GU14 0LX
> http://www.qinetiq.com.
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

--bcaec544eeb089f1f304a41f4b0c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It sounds like the problem is that the row is getting filled up with tombst=
ones and becoming enormous? =A0Another idea then, which might not be worth =
the added complexity, is to progressively use new rows. =A0Depending on vol=
ume, this could mean having 5-minute-window rows, or 1 minute, or whatever =
works best.<div>

<br></div><div>Read: Assuming you&#39;re not falling behind, you only need =
to query the row that the current time falls in and the one immediately pri=
or. =A0If you do fall behind, you&#39;ll have to walk backwards in buckets =
until you find them empty.<br>

Write: Write column to the bucket (row) that corresponds to the correct tim=
e window.</div><div>Delete: Delete the column from the row it was read from=
. =A0When all columns in the row are deleted the row can GC.</div><div><br>

</div><div>Again, cassandra might not be the correct datastore.</div><div><=
br><div class=3D"gmail_quote">On Wed, May 25, 2011 at 3:56 PM, Jonathan Ell=
is <span dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.com">jbellis@gmail=
.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">You&#39;re basically intentionally inflicti=
ng the worst case scenario on<br>
the Cassandra storage engine:<br>
<a href=3D"http://wiki.apache.org/cassandra/DistributedDeletes" target=3D"_=
blank">http://wiki.apache.org/cassandra/DistributedDeletes</a><br>
<br>
You could play around with reducing gc_grace_seconds but a PQ with<br>
&quot;millions&quot; of items is something you should probably just do in m=
emory<br>
these days.<br>
<br>
On Wed, May 25, 2011 at 10:43 AM, =A0&lt;<a href=3D"mailto:dnallsopp@taz.qi=
netiq.com">dnallsopp@taz.qinetiq.com</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt; Hi all,<br>
&gt;<br>
&gt; I&#39;m trying to implement a priority queue for holding a large numbe=
r (millions)<br>
&gt; of items that need to be processed in time order. My solution works - =
but gets<br>
&gt; slower and slower until performance is unacceptable - even with a smal=
l number<br>
&gt; of items.<br>
&gt;<br>
&gt; Each item essentially needs to be popped off the queue (some arbitrary=
 work is<br>
&gt; then done) and then the item is returned to the queue with a new times=
tamp<br>
&gt; indicating when it should be processed again. We thus cycle through al=
l work<br>
&gt; items eventually, but some may come around more frequently than others=
.<br>
&gt;<br>
&gt; I am implementing this as a single Cassandra row, in a CF with a TimeU=
UID<br>
&gt; comparator.<br>
&gt;<br>
&gt; Each column name is a TimeUUID, with an arbitrary column value describ=
ing the<br>
&gt; work item; the columns are thus sorted in time order.<br>
&gt;<br>
&gt; To pop items, I do a get() such as:<br>
&gt;<br>
&gt; =A0cf.get(row_key, column_finish=3Dnow, column_start=3Dyesterday, colu=
mn_count=3D1000)<br>
&gt;<br>
&gt; to get all the items at the head of the queue (if any) whose time exce=
eds the<br>
&gt; current system time.<br>
&gt;<br>
&gt; For each item retrieved, I do a delete to remove the old column, then =
an insert<br>
&gt; with a fresh TimeUUID column name (system time + arbitrary increment),=
 thus<br>
&gt; putting the item back somewhere in the queue (currently, the back of t=
he queue)<br>
&gt;<br>
&gt; I do a batch_mutate for all these deletes and inserts, with a queue si=
ze of<br>
&gt; 2000. These are currently interleaved i.e. delete1-insert1-delete2-ins=
ert2...<br>
&gt;<br>
&gt; This all appears to work correctly, but the performance starts at arou=
nd 8000<br>
&gt; cycles/sec, falls to around 1800/sec over the first 250K cycles, and c=
ontinues<br>
&gt; to fall over time, down to about 150/sec, after a few million cycles. =
This<br>
&gt; happens regardless of the overall size of the row (I have tried sizes =
from 1000<br>
&gt; to 100,000 items). My target performance is 1000 cycles/sec (but my da=
ta store<br>
&gt; will need to handle other work concurrently).<br>
&gt;<br>
&gt; I am currently using just a single node running on localhost, using a =
pycassa<br>
&gt; client. 4 core, 4GB machine, Fedora 14.<br>
&gt;<br>
&gt; Is this expected behaviour (is there just too much churn for a single =
row to<br>
&gt; perform well), or am I doing something wrong?<br>
&gt;<br>
&gt; Would <a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-2583"=
 target=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-2583</a>=
 in version 0.8.1 fix<br>
&gt; this problem (I am using version 0.7.6)?<br>
&gt;<br>
&gt; Thanks!<br>
&gt;<br>
&gt; David.<br>
&gt;<br>
&gt; ----------------------------------------------------------------<br>
&gt; This message was sent using IMP, the Internet Messaging Program.<br>
&gt;<br>
&gt; This email and any attachments to it may be confidential and are<br>
&gt; intended solely for the use of the individual to whom it is addressed.=
<br>
&gt; If you are not the intended recipient of this email, you must neither<=
br>
&gt; take any action based upon its contents, nor copy or show it to anyone=
.<br>
&gt; Please contact the sender if you believe you have received this email =
in<br>
&gt; error. QinetiQ may monitor email traffic data and also the content of<=
br>
&gt; email for the purposes of security. QinetiQ Limited (Registered in<br>
&gt; England &amp; Wales: Company Number: 3796233) Registered office: Cody =
Technology<br>
&gt; Park, Ively Road, Farnborough, Hampshire, GU14 0LX <a href=3D"http://w=
ww.qinetiq.com" target=3D"_blank">http://www.qinetiq.com</a>.<br>
&gt;<br>
<font color=3D"#888888"><br>
<br>
<br>
--<br>
Jonathan Ellis<br>
Project Chair, Apache Cassandra<br>
co-founder of DataStax, the source for professional Cassandra support<br>
<a href=3D"http://www.datastax.com" target=3D"_blank">http://www.datastax.c=
om</a><br>
</font></blockquote></div><br></div>

--bcaec544eeb089f1f304a41f4b0c--