Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C5D5374BA for ; Tue, 27 Dec 2011 06:32:29 +0000 (UTC) Received: (qmail 38335 invoked by uid 500); 27 Dec 2011 06:32:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 37577 invoked by uid 500); 27 Dec 2011 06:32:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 37553 invoked by uid 99); 27 Dec 2011 06:32:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 06:32:25 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of burtonator2011@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 06:32:19 +0000 Received: by werb14 with SMTP id b14so6115269wer.31 for ; Mon, 26 Dec 2011 22:31:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; bh=uluzl1pPyf7VVqc3eYn64E3UeK27roFFBkBlku8hYU8=; b=QYxiE7MNVvMTIBkouQDak+rzpONSaPWlGlgKFpn8XTH5S7MlkR4S8hZLpnVdgO6ST0 cMhDLZhcw2iikAEu+jjaF5mlUsr54feIM5zobKvUVS8syGjz+R3S0uaP6sBRD2ZjahY0 KAFy0aSlKbwckD+IjtvAPE4xBc5UvnGhPIaDQ= Received: by 10.216.135.162 with SMTP id u34mr14921711wei.1.1324967519113; Mon, 26 Dec 2011 22:31:59 -0800 (PST) MIME-Version: 1.0 Sender: burtonator2011@gmail.com Received: by 10.216.52.139 with HTTP; Mon, 26 Dec 2011 22:31:39 -0800 (PST) From: Kevin Burton Date: Mon, 26 Dec 2011 22:31:39 -0800 X-Google-Sender-Auth: FGirf5FnSrZcEM7rOcdjJUOADuk Message-ID: Subject: Peregrine: A new map reduce framework for iterative/pipelined jobs. To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6de007850d38304b50d0acf X-Virus-Checked: Checked by ClamAV on apache.org --0016e6de007850d38304b50d0acf Content-Type: text/plain; charset=ISO-8859-1 I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework optimized for iterative and pipelined map reduce jobs. http://peregrine_mapreduce.bitbucket.org/ This originally started off with some internal work at Spinn3r to build a fast and efficient Pagerank implementation. We realized that what we wanted was a MR runtime optimized for this type of work which differs radically from the traditional Hadoop design. Peregrine implements a partitioned distributed filesystem where key/value pairs are routed to defined partitions. This enables work to be joined against previous iterations or different units of work by the same key on the same local system. Peregrine is optimized for ETL jobs where the primary data storage system is an external database such as Cassandra, Hbase, MySQL, etc. Jobs are then run as a Extract, Transform and Load stages with intermediate data being stored in the Peregrine FS. We enable features such as Map/Reduce/Merge as well as some additional functionality like ExtractMap and ReduceLoad (in ETL parlance). A key innovation here is a partitioning layout algorithm that can support fast many to many recovery similar to HDFS but still support partitioned operation with deterministic key placement. We've also tried to optimize for single instance performance and use modern IO primitives as much as possible. This includes NOT shying away from operating specific features such as mlock, fadvise, fallocate, etc. There is still a bit more work I want to do before I am ready to benchmark it against Hadoop. Instead of implementing a synthetic benchmark we wanted to get a production ready version first which would allow people to port existing applications and see what the before / after performance numbers looked like in the real world. For more information please see: http://peregrine_mapreduce.bitbucket.org/ As well as our design documentation: http://peregrine_mapreduce.bitbucket.org/design/ -- -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* Skype-in: *(415) 871-0687* --0016e6de007850d38304b50d0acf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I'm pleased to announce Peregrine 0.5.0 - a new map reduce framewor= k optimized
for iterative and pipelined map reduce jobs.

http://peregrine_mapreduce.b= itbucket.org/

This originally started off with some internal work at Spinn3r to build= a fast
and efficient Pagerank implementation.=A0 We realized that what = we wanted was a MR
runtime optimized for this type of work which differs= radically from the
traditional Hadoop design.

Peregrine implements a partitioned distri= buted filesystem where key/value pairs
are routed to defined partitions.= =A0 This enables work to be joined against
previous iterations or differ= ent units of work by the same key on the same local
system.

Peregrine is optimized for ETL jobs where the primary data s= torage system is an
external database such as Cassandra, Hbase, MySQL, e= tc.=A0 Jobs are then run as a
Extract, Transform and Load stages with in= termediate data being stored in the
Peregrine FS.

We enable features such as Map/Reduce/Merge as well as= some additional
functionality like ExtractMap and ReduceLoad (in ETL pa= rlance).

A key innovation here is a partitioning layout algorithm th= at can support fast
many to many recovery similar to HDFS but still support partitioned operati= on
with deterministic key placement.

We've also tried to opti= mize for single instance performance and use modern IO
primitives as muc= h as possible.=A0 This includes NOT shying away from operating
specific features such as mlock, fadvise, fallocate, etc.=A0

There = is still a bit more work I want to do before I am ready to benchmark it
= against Hadoop.=A0 Instead of implementing a synthetic benchmark we wanted = to get
a production ready version first which would allow people to port existing<= br>applications and see what the before / after performance numbers looked = like in
the real world.

For more information please see:

http://peregrine_= mapreduce.bitbucket.org/

As well as our design documentation:
http://pe= regrine_mapreduce.bitbucket.org/design/



--
--

Founder/CEO=A0Spi= nn3r.com

Location:=A0San Francisco, = CA
Skype:=A0burtonator

Skype-in:=A0(415) 871-068= 7


--0016e6de007850d38304b50d0acf--