From user-return-23173-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Dec 27 15:12:38 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F11497970 for ; Tue, 27 Dec 2011 15:12:37 +0000 (UTC) Received: (qmail 38320 invoked by uid 500); 27 Dec 2011 15:12:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38297 invoked by uid 500); 27 Dec 2011 15:12:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38289 invoked by uid 99); 27 Dec 2011 15:12:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 15:12:35 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of boneill42@gmail.com designates 209.85.216.51 as permitted sender) Received: from [209.85.216.51] (HELO mail-qw0-f51.google.com) (209.85.216.51) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2011 15:12:29 +0000 Received: by qadz3 with SMTP id z3so7662901qad.10 for ; Tue, 27 Dec 2011 07:12:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:references:from:content-type:in-reply-to:message-id:date:to :mime-version:x-mailer; bh=+MuiI6AkBwJ2PRSpnZWoCHUxNVfSdaHYDw29vW2OXpM=; b=X3Zyraal+/UxnG9mx1nhx4cMqW8HfaPRFsXbM2OFiirTHNISxWAegNfaAozVNsewsl kUaB4Jc+0F3qIN1VZf2lUbA+Y0/ruyXBkLmem49ANxrp02fC0ez9cwmmaLKUBJJChDxQ 33JjTnqsCNdTAxBnBEwdOg7AZ0zQTp5osy508= Received: by 10.224.31.148 with SMTP id y20mr33741644qac.80.1324998726995; Tue, 27 Dec 2011 07:12:06 -0800 (PST) Received: from [192.168.0.103] (c-68-63-149-124.hsd1.pa.comcast.net. [68.63.149.124]) by mx.google.com with ESMTPS id q14sm51694430qap.4.2011.12.27.07.12.05 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 27 Dec 2011 07:12:06 -0800 (PST) Subject: Re: Peregrine: A new map reduce framework for iterative/pipelined jobs. References: From: Brian O'Neill Content-Type: multipart/alternative; boundary=Apple-Mail-7-946704070 In-Reply-To: Message-Id: <2BA42BA4-9D96-4FF1-8A91-E03602726CE7@gmail.com> Date: Tue, 27 Dec 2011 10:12:51 -0500 To: user@cassandra.apache.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-7-946704070 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Kevin, I just pulled the code and read through the design. Great stuff. Any thought to potentially using this for real-time processing as well? = Right now, we have a set of Hadoop M/R jobs that operate against = Cassandra for ETL. We were looking at using Storm for the real-time = processing side of things and thought that we could actually abandon = Hadoop entirely if we could introduce Cassandra's concept of data = locality to Storm. We plan to run head-to-head comparisons between = Storm and Hadoop to test out the viability of that approach. Peregrine looks like another contender. cheers, -brian =20 On Dec 27, 2011, at 6:14 AM, Kevin Burton wrote: >=20 > A key innovation here is a partitioning layout algorithm that can = support fast > many to many recovery similar to HDFS but still support partitioned = operation > with deterministic key placement. >=20 > Thanks for your contribution. >=20 > Is here more detail info on this point?=20 >=20 > yes... our design document: >=20 > http://peregrine_mapreduce.bitbucket.org/design/ >=20 > I actually will probably write a paper on this...=20 >=20 > The more I started down the partitioned filesystem approach in terms = of mapreduce the more I realized that there were some REALLY elegant = imoplementation and design issues that I did not originally appreciate = ... (so I partially got lucky). >=20 > I think this approach could be generalized to work on normal map = reduce jobs without much overhead. > =20 > --=20 > Founder/CEO Spinn3r.com >=20 > Location: San Francisco, CA > Skype: burtonator > Skype-in: (415) 871-0687 >=20 --=20 Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ --Apple-Mail-7-946704070 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii

A key innovation here is a partitioning layout algorithm that can = support fast
many to many recovery similar to HDFS but still support partitioned = operation
with deterministic key = placement.

Thanks for your = contribution.

Is here more detail info on this point?

yes... our design document:

http://peregrine_mapreduce.bitbucket.org/design/
=
I actually will probably write a paper on this...

The more I started down the partitioned filesystem approach in terms = of mapreduce the more I realized that there were some REALLY elegant = imoplementation and design issues that I did not originally appreciate = ... (so I partially got lucky).

I think this approach could be generalized to work on normal map = reduce jobs without much overhead.
 
-- =

Founder/CEO Spinn3r.com

Location: San Francisco, = CA
Skype: burtonator
Skype-in: (415) = 871-0687



= --Apple-Mail-7-946704070--