Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1C39DDF63 for ; Fri, 3 Aug 2012 18:27:38 +0000 (UTC) Received: (qmail 24101 invoked by uid 500); 3 Aug 2012 18:27:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24074 invoked by uid 500); 3 Aug 2012 18:27:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 24064 invoked by uid 99); 3 Aug 2012 18:27:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 18:27:35 +0000 X-ASF-Spam-Status: No, hits=0.9 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of boneill42@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qa0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 18:27:31 +0000 Received: by qadz3 with SMTP id z3so4384156qad.10 for ; Fri, 03 Aug 2012 11:27:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=7Vpcw/Euy82G1zuak5PO5wFM37YBkNbhjjbRwHSuXvc=; b=Myvs6Zvng8PYrv8k9xnJwet8+rcnNHCqp2jQwupKhuhCWQcV/uH+HE4g9O9cbraggL 7Totz2arCm33GQr8waNEoopSDf98CiHyXDmvlxzJHf4FuqTcnEp78aqy58pg2HxNxJxh krSqSJnbse1KgtJ5CiCazmnHkMk7PPa29EVQO4412y22J2ZUIBMHwJdJu5JADqp0r+qI Kpr4j4FlfUqYrc0g4fLMPyZ5gkaRmu54Fmz68h5Up0BFNepvtOlO+Mb6H23U0V9zID0L XNtQcE3Iqk0I+58HZRKaEgQE+f1eEETdUEi0RVlVqQiMkD1i04cRoYq6gti6qyG3r0q/ DFgw== MIME-Version: 1.0 Received: by 10.60.21.198 with SMTP id x6mr6782177oee.24.1344018430839; Fri, 03 Aug 2012 11:27:10 -0700 (PDT) Sender: boneill42@gmail.com Received: by 10.76.8.41 with HTTP; Fri, 3 Aug 2012 11:27:10 -0700 (PDT) In-Reply-To: <1344017886.28730.YahooMailClassic@web125704.mail.ne1.yahoo.com> References: <1344017886.28730.YahooMailClassic@web125704.mail.ne1.yahoo.com> Date: Fri, 3 Aug 2012 14:27:10 -0400 X-Google-Sender-Auth: VmguhB_Gvwb4dfb-Jvq-Dhn6GHU Message-ID: Subject: Re: How to process new rows in parallel? From: "Brian O'Neill" To: user@cassandra.apache.org, philipomailbox-cass@yahoo.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org If you are deleting the messages after processing, it sounds like you are using Cassandra as a work queue. Here are some links for implementing a distributed queue in Cassandra: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distribute= d-work-queues-td5226248.html http://comments.gmane.org/gmane.comp.db.cassandra.user/16633 There is a placeholder on the use cases wiki for this, but no info: http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue We were looking to do the same thing, but in the end decided to go with Kaf= ka. Given your throughput requirements, Kafka might be a good option for you as well. -brian On Fri, Aug 3, 2012 at 2:18 PM, Philip Nelson wrote: > Hello, > > I am using a Column Family in Cassandra to store incoming messages, which= arrive at a high rate (100s of thousands per second). I then have a proces= s wake up periodically to work on those messages, and then delete them. I'd= like to understand how I could have multiple processes running, each pulli= ng off a bunch of messages in parallel. It would be nice to be able to add = processes dynamically, and not have to explicitly assign message ranges to = various processes. > > Any suggestions on how to ensure that each process pulls off a different = bunch of messages? Any recommended design patterns? I was going to look at = qsandra too, for inspiration. Would this be worthwhile? > > If this was a relational database, I would have the processes lock the ta= ble (or perhaps a row), set flags on a row indicating that it's being "proc= essed", and then unlock. Processes would choose messages by SELECTing on un= flagged messages. I'm not sure how this might map to Cassandra. I realise i= t may not. Even if I configure the cluster such that seting a flag on a row= requires all nodes to be written, two processes could still race setting t= hat flag, right? > > I am open to the idea that it might help to store the messages in wide ro= ws, if that helps. > > Thanks, > > Philip --=20 Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/