incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian O'Neill <>
Subject Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.
Date Tue, 27 Dec 2011 15:12:51 GMT

I just pulled the code and read through the design.  Great stuff.

Any thought to potentially using this for real-time processing as well?  Right now, we have
a set of Hadoop M/R jobs that operate against Cassandra for ETL.  We were looking at using
Storm for the real-time processing side of things and thought that we could actually abandon
Hadoop entirely if we could introduce Cassandra's concept of data locality to Storm.  We plan
to run head-to-head comparisons between Storm and Hadoop to test out the viability of that

Peregrine looks like another contender.


On Dec 27, 2011, at 6:14 AM, Kevin Burton wrote:

> A key innovation here is a partitioning layout algorithm that can support fast
> many to many recovery similar to HDFS but still support partitioned operation
> with deterministic key placement.
> Thanks for your contribution.
> Is here more detail info on this point? 
> yes... our design document:
> I actually will probably write a paper on this... 
> The more I started down the partitioned filesystem approach in terms of mapreduce the
more I realized that there were some REALLY elegant imoplementation and design issues that
I did not originally appreciate ... (so I partially got lucky).
> I think this approach could be generalized to work on normal map reduce jobs without
much overhead.
> -- 
> Founder/CEO
> Location: San Francisco, CA
> Skype: burtonator
> Skype-in: (415) 871-0687

Brian ONeill
Lead Architect, Health Market Science (

View raw message