incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhu Han <schumi....@gmail.com>
Subject Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.
Date Tue, 27 Dec 2011 09:51:35 GMT
On Tue, Dec 27, 2011 at 2:31 PM, Kevin Burton <burtonator@gmail.com> wrote:

>
> I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework
> optimized
> for iterative and pipelined map reduce jobs.
>
> http://peregrine_mapreduce.bitbucket.org/
>
> This originally started off with some internal work at Spinn3r to build a
> fast
> and efficient Pagerank implementation.  We realized that what we wanted
> was a MR
> runtime optimized for this type of work which differs radically from the
> traditional Hadoop design.
>
> Peregrine implements a partitioned distributed filesystem where key/value
> pairs
> are routed to defined partitions.  This enables work to be joined against
> previous iterations or different units of work by the same key on the same
> local
> system.
>
> Peregrine is optimized for ETL jobs where the primary data storage system
> is an
> external database such as Cassandra, Hbase, MySQL, etc.  Jobs are then run
> as a
> Extract, Transform and Load stages with intermediate data being stored in
> the
> Peregrine FS.
>
> We enable features such as Map/Reduce/Merge as well as some additional
> functionality like ExtractMap and ReduceLoad (in ETL parlance).
>
> A key innovation here is a partitioning layout algorithm that can support
> fast
> many to many recovery similar to HDFS but still support partitioned
> operation
> with deterministic key placement.
>

Thanks for your contribution.

Is here more detail info on this point?


>
> We've also tried to optimize for single instance performance and use
> modern IO
> primitives as much as possible.  This includes NOT shying away from
> operating
> specific features such as mlock, fadvise, fallocate, etc.
>
> There is still a bit more work I want to do before I am ready to benchmark
> it
> against Hadoop.  Instead of implementing a synthetic benchmark we wanted
> to get
> a production ready version first which would allow people to port existing
> applications and see what the before / after performance numbers looked
> like in
> the real world.
>
> For more information please see:
>
> http://peregrine_mapreduce.bitbucket.org/
>
> As well as our design documentation:
>
> http://peregrine_mapreduce.bitbucket.org/design/
>
>
>
> --
> --
>
> Founder/CEO Spinn3r.com <http://spinn3r.com/>
>
> Location: *San Francisco, CA*
> Skype: *burtonator*
>
> Skype-in: *(415) 871-0687*
>
>

Mime
View raw message