cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Schnitzius <>
Subject Re: Hadoop over Cassandra
Date Wed, 19 May 2010 04:40:46 GMT
> If anyone has "war stories" on the topic of Cassandra & Hadoop (or
> even just Hadoop in general) let me know.

Don't know if it counts as a war story, but I was successful recently in
implementing something I got advice on in an earlier thread, namely feeding
both a Cassandra table and a Hadoop sequence file into the same map/reduce
process and updating the same Cassandra table with the results.  I used the
approach I mentioned before, of creating an InputFormat that returns splits
from both (and creating a RecordReader that massages the Cass data into the
same format as the sequence file data).  I'll write something up about it
for the wiki, when I can find some time.

My chief concern with it, though, is gracefully handling a map/reduce
failure.  As Cassandra isn't transactional, the table may end up partially
updated, which is a problem, at least in the domain I'm working in.  So now
I'm trying to come up with a way to effect Cassandra transactions via column
naming conventions or indexes or something like that.  I'd be curious to
hear if anyone here has ever implemented a solution for something similar


View raw message