Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 63379 invoked from network); 19 May 2010 04:41:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 May 2010 04:41:16 -0000 Received: (qmail 13316 invoked by uid 500); 19 May 2010 04:41:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13163 invoked by uid 500); 19 May 2010 04:41:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13155 invoked by uid 99); 19 May 2010 04:41:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 May 2010 04:41:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: unknown (athena.apache.org: error in processing during lookup of mark.schnitzius@cxense.com) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 May 2010 04:41:08 +0000 Received: by iwn42 with SMTP id 42so2625481iwn.31 for ; Tue, 18 May 2010 21:40:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.150.16 with SMTP id w16mr2959630ibv.42.1274244046982; Tue, 18 May 2010 21:40:46 -0700 (PDT) Received: by 10.231.192.147 with HTTP; Tue, 18 May 2010 21:40:46 -0700 (PDT) In-Reply-To: References: <1274205063.605616906@192.168.2.228> Date: Wed, 19 May 2010 14:40:46 +1000 Message-ID: Subject: Re: Hadoop over Cassandra From: Mark Schnitzius To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e68dda0dc70de90486eb0fcb --0016e68dda0dc70de90486eb0fcb Content-Type: text/plain; charset=ISO-8859-1 > > If anyone has "war stories" on the topic of Cassandra & Hadoop (or > even just Hadoop in general) let me know. Don't know if it counts as a war story, but I was successful recently in implementing something I got advice on in an earlier thread, namely feeding both a Cassandra table and a Hadoop sequence file into the same map/reduce process and updating the same Cassandra table with the results. I used the approach I mentioned before, of creating an InputFormat that returns splits from both (and creating a RecordReader that massages the Cass data into the same format as the sequence file data). I'll write something up about it for the wiki, when I can find some time. My chief concern with it, though, is gracefully handling a map/reduce failure. As Cassandra isn't transactional, the table may end up partially updated, which is a problem, at least in the domain I'm working in. So now I'm trying to come up with a way to effect Cassandra transactions via column naming conventions or indexes or something like that. I'd be curious to hear if anyone here has ever implemented a solution for something similar before... Thanks Mark --0016e68dda0dc70de90486eb0fcb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

If anyone has &q= uot;war stories" on the topic of Cassandra & Hadoop (or
even just Hadoop in general) let me know.

<= br>

Don't know if it counts as a war story, but I was success= ful recently in implementing something I got advice on in an earlier thread= , namely feeding both a Cassandra table and a Hadoop sequence file into the= same map/reduce process and updating the same Cassandra table with the res= ults. =A0I used the approach I mentioned before, of creating an InputFormat= that returns splits from both (and creating a RecordReader that massages t= he Cass data into the same format as the sequence file data). =A0I'll w= rite something up about it for the wiki, when I can find some time.

My chief concern with it, though, is gracefully handlin= g a map/reduce failure. =A0As Cassandra isn't transactional, the table = may end up partially updated, which is a problem, at least in the domain I&= #39;m working in. =A0So now I'm trying to come up with a way to effect = Cassandra transactions via column naming conventions or indexes or somethin= g like that. =A0I'd be curious to hear if anyone here has ever implemen= ted a solution for something similar before...

Thanks

Mark

--0016e68dda0dc70de90486eb0fcb--