Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: pass (nike.apache.org: domain of malouf.gary@gmail.com
 designates 209.85.192.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAGOvqipJEM=biUE5W7r_f+2q5CTh_Ndnq=PE3rHfVMruBWN05g@mail.gmail.com>
References: 
 <CAGOvqipJEM=biUE5W7r_f+2q5CTh_Ndnq=PE3rHfVMruBWN05g@mail.gmail.com>
Date: Tue, 26 Aug 2014 08:20:45 -0400
Message-ID: 
 <CAGOvqio9vdiDApeV=XshexXtCnkwbBbBZPeCeAo2EaOhFNMGng@mail.gmail.com>
Subject: Re: CoHadoop Papers
From: Gary Malouf <malouf.gary@gmail.com>
To: "dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a113a666437d4a80501875584

--001a113a666437d4a80501875584
Content-Type: text/plain; charset=UTF-8

It appears support for this type of control over block placement is going
out in the next version of HDFS:
https://issues.apache.org/jira/browse/HDFS-2576


On Tue, Aug 26, 2014 at 7:43 AM, Gary Malouf <malouf.gary@gmail.com> wrote:

> One of my colleagues has been questioning me as to why Spark/HDFS makes no
> attempts to try to co-locate related data blocks.  He pointed to this
> paper: http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf from 2011 on the
> CoHadoop research and the performance improvements it yielded for
> Map/Reduce jobs.
>
> Would leveraging these ideas for writing data from Spark make sense/be
> worthwhile?
>
>
>

--001a113a666437d4a80501875584--