Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5729911EE6 for ; Tue, 26 Aug 2014 12:21:14 +0000 (UTC) Received: (qmail 95438 invoked by uid 500); 26 Aug 2014 12:21:12 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 95370 invoked by uid 500); 26 Aug 2014 12:21:12 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 95358 invoked by uid 99); 26 Aug 2014 12:21:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Aug 2014 12:21:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=HTML_MESSAGE,HTML_OBFUSCATE_05_10,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of malouf.gary@gmail.com designates 209.85.192.48 as permitted sender) Received: from [209.85.192.48] (HELO mail-qg0-f48.google.com) (209.85.192.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Aug 2014 12:20:46 +0000 Received: by mail-qg0-f48.google.com with SMTP id i50so14721873qgf.35 for ; Tue, 26 Aug 2014 05:20:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9DoRcZXYtZQAQEccKWClVlIp9+xOQEQZxwU1fI12mqY=; b=uMaIKMhGMnoCrk9mquMh5ZK75Zgmgu+vwR1LsrDZeoHYSvB1BUmblKRRBYTxp+o2eI d53da7f2wsmg639M9yPSOstSUN4ZRvRO1eksb9hSkL53B7+sFdfD7vePUyuhKog33X+/ ow+4n5DBLsmBAiImIVFM/CJyUZnoIm6HnRUR0S/dkE8nz2e9dQxMXKk6N5y3qDS8B5Zj xH5YKv5M/9Nk2NjU99qmvE7ZnkT5vKAwlfL6NONvoD6N/ozD/hwgqV5Fmab+BJSJZpEl hXvvUIN1fnTdqKUXCmTBwHs0qv/TEqe9cHAj6IabqY/jb94x1qdZ8qFFtpR6U0oczdr4 W89A== MIME-Version: 1.0 X-Received: by 10.140.43.245 with SMTP id e108mr42106788qga.76.1409055645486; Tue, 26 Aug 2014 05:20:45 -0700 (PDT) Received: by 10.140.29.102 with HTTP; Tue, 26 Aug 2014 05:20:45 -0700 (PDT) In-Reply-To: References: Date: Tue, 26 Aug 2014 08:20:45 -0400 Message-ID: Subject: Re: CoHadoop Papers From: Gary Malouf To: "dev@spark.apache.org" Content-Type: multipart/alternative; boundary=001a113a666437d4a80501875584 X-Virus-Checked: Checked by ClamAV on apache.org --001a113a666437d4a80501875584 Content-Type: text/plain; charset=UTF-8 It appears support for this type of control over block placement is going out in the next version of HDFS: https://issues.apache.org/jira/browse/HDFS-2576 On Tue, Aug 26, 2014 at 7:43 AM, Gary Malouf wrote: > One of my colleagues has been questioning me as to why Spark/HDFS makes no > attempts to try to co-locate related data blocks. He pointed to this > paper: http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf from 2011 on the > CoHadoop research and the performance improvements it yielded for > Map/Reduce jobs. > > Would leveraging these ideas for writing data from Spark make sense/be > worthwhile? > > > --001a113a666437d4a80501875584--