Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 70117 invoked from network); 26 Jun 2010 22:24:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Jun 2010 22:24:29 -0000 Received: (qmail 81921 invoked by uid 500); 26 Jun 2010 22:24:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81846 invoked by uid 500); 26 Jun 2010 22:24:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81838 invoked by uid 99); 26 Jun 2010 22:24:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Jun 2010 22:24:26 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vram.kouramajian@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Jun 2010 22:24:22 +0000 Received: by vws7 with SMTP id 7so23149vws.31 for ; Sat, 26 Jun 2010 15:24:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=H0V8OyMC+nYnkqz2HRnYZnqceYTN0v1oEH+8gaJ4z9s=; b=XxfpL2hAo89CGEruVK5QtkGuvtJ8NjuPbR0cCN0geKA8Pc9echnkZ20lnOj4Ow9HyY wlD2/BlhW7IOZu0Svy+Tq8xlSH/Zo/L6sxkoSVMV7DchIv3cvZmTNp1fuFj/buJ5JwI8 inNkkhBFi9t3S4R8oWEAzuCs2KzBjvbHpoNvQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=O8t81UTiQjfq+UhTruXE/qA3Iun0FiYF8cdXTI2GflIYhi+gYtALZs8bJAil3iFvsT /zEdxO4DYr5pqkiD3LAUG/ZZhLlsoIwLjC2Ft6oX4e6rav0Olxajr5ZEi091YZR4yNCB u213eNUnsiluPWu+EN2A0cm2KEKWoediRjiOI= MIME-Version: 1.0 Received: by 10.220.63.209 with SMTP id c17mr1637057vci.152.1277591040664; Sat, 26 Jun 2010 15:24:00 -0700 (PDT) Received: by 10.220.203.196 with HTTP; Sat, 26 Jun 2010 15:24:00 -0700 (PDT) In-Reply-To: <8C1188E3-0B05-49EB-B512-BDF6344A6E3E@pagerduty.com> References: <8C1188E3-0B05-49EB-B512-BDF6344A6E3E@pagerduty.com> Date: Sat, 26 Jun 2010 15:24:00 -0700 Message-ID: Subject: Re: Distributed work-queues? From: Vram Kouramajian To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6469bc825a5540489f658e6 --0016e6469bc825a5540489f658e6 Content-Type: text/plain; charset=ISO-8859-1 We have implemented a distributed queue (similar to AWS SQS) and a job queue in Cassandra. Vram On Sat, Jun 26, 2010 at 1:56 PM, Andrew Miklas wrote: > Hi all, > > Has anyone written a work-queue implementation using Cassandra? > > There's a section in the UseCase wiki page for "A distributed Priority Job > Queue" which looks perfect, but unfortunately it hasn't been filled in yet. > http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue > > I've been thinking about how best to do this, but every solution I've > thought of seems to have some serious drawback. The "range ghost" problem > in particular creates some issues. I'm assuming each job has a row within > some column family, where the row's key is the time at which the job should > be run. To find the next job, you'd do a range query with a start a few > hours in the past, and an end at the current time. Once a job is completed, > you delete the row. > > The problem here is that you have to scan through deleted-but-not-yet-GCed > rows each time you run the query. Is there a better way? > > Preventing more than one worker from starting the same job seems like it > would be a problem too. You'd either need an external locking manager, or > have to use some other protocol where workers write their ID into the row > and then immediately read it back to confirm that they are the owner of the > job. > > Any ideas here? Has anyone come up with a nice implementation? Is > Cassandra not well suited for queue-like tasks? > > > > Thanks, > > > Andrew > --0016e6469bc825a5540489f658e6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

We have implemented a distributed queue (similar to AWS SQS)=A0 an= d a job queue in Cassandra.

Vram


On Sat, Jun 26, 2010 at 1:56 PM, Andrew Miklas <andrew@pagerduty.com> wr= ote:
Hi all,

Has anyone written a work-queue implementation using Cassandra?

There's a section in the UseCase wiki page for "A distributed Prio= rity Job Queue" which looks perfect, but unfortunately it hasn't b= een filled in yet.
http://wiki.apache.org/cassandra/UseCases#A_d= istributed_Priority_Job_Queue

I've been thinking about how best to do this, but every solution I'= ve thought of seems to have some serious drawback. =A0The "range ghost= " problem in particular creates some issues. =A0I'm assuming each = job has a row within some column family, where the row's key is the tim= e at which the job should be run. =A0To find the next job, you'd do a r= ange query with a start a few hours in the past, and an end at the current = time. =A0Once a job is completed, you delete the row.

The problem here is that you have to scan through deleted-but-not-yet-GCed = rows each time you run the query. =A0Is there a better way?

Preventing more than one worker from starting the same job seems like it wo= uld be a problem too. =A0You'd either need an external locking manager,= or have to use some other protocol where workers write their ID into the r= ow and then immediately read it back to confirm that they are the owner of = the job.

Any ideas here? =A0Has anyone come up with a nice implementation? =A0Is Cas= sandra not well suited for queue-like tasks?



Thanks,


Andrew

--0016e6469bc825a5540489f658e6--