MapR does this already .. and well beyond just 2 files.  One can arrange things so that a boatload of files have all their replicas also placed on the same set of nodes, ie,  files A ... Z will have replica1 on node1, replica2 on node2, replica3 on node3. etc.  (nodes 1. 2 and 3 are picked by the system based on utilization and node-fullness).




On Wed, Dec 5, 2012 at 11:26 AM, Sigurd Spieckermann <sigurd.spieckermann@gmail.com> wrote:
Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I hope this is coming soon!

Am 05.12.2012 18:58, schrieb Harsh J:

You are probably talking of
https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
This feature isn't available in HDFS yet, but may arrive soon.

On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
<sigurd.spieckermann@gmail.com> wrote:
Hi guys,

I have been wondering if there's a way (hack'ish would be okay too) to tell
Hadoop that two files shall be stored together at the same location(s). It
would benefit map-side join performance if it could be done somehow because
all map tasks would be able to read data from a local copy. Does anyone know
a way?

-Sigurd