hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rasit OZDAS <rasitoz...@gmail.com>
Subject Copying a file to specified nodes
Date Tue, 10 Feb 2009 13:05:32 GMT

We have thousands of files, each dedicated to a user.  (Each user has
access to other users' files, but they do this not very often.)
Each user runs map-reduce jobs on the cluster.
So we should seperate his/her files equally across the cluster,
so that every machine can take part in the process (assuming he/she is
the only user running jobs).
For this we should initially copy files to specified nodes:
User A :   first file : Node 1, second file: Node 2, .. etc.
User B :   first file : Node 1, second file: Node 2, .. etc.

I know, hadoop create also replicas, but in our solution at least one
file will be in the right place
(or we're willing to control other replicas too).

Rebalancing is also not a problem, assuming it uses the information
about how much a computer is in use.
It even helps for a better organization of files.

How can we copy files to specified nodes?
Or do you have a better solution for us?

I couldn't find a solution to this, probably such an option doesn't exist.
But I wanted to take an expert's opinion about this.

Thanks in advance..

View raw message