hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rasit OZDAS <rasitoz...@gmail.com>
Subject Re: Copying a file to specified nodes
Date Mon, 16 Feb 2009 15:17:18 GMT
Yes, I've tried the long solution;
when I execute   ./hadoop dfs -put ... from a datanode,
in any case 1 copy gets written to that datanode.

But I think I should use SSH for this,
Anybody knows a better way?


2009/2/16 Rasit OZDAS <rasitozdas@gmail.com>:
> Thanks, Jeff.
> After considering JIRA link you've given and making some investigation:
> It seems that this JIRA ticket didn't draw much attention, so will
> take much time to be considered.
> After some more investigation I found out that when I copy the file to
> HDFS from a specific DataNode, first copy will be written to that
> DataNode itself. This solution will take long to implement, I think.
> But we definitely need this feature, so if we have no other choice,
> we'll go though it.
> Any further info (or comments on my solution) is appreciated.
> Cheers,
> Rasit
> 2009/2/10 Jeff Hammerbacher <hammer@cloudera.com>:
>> Hey Rasit,
>> I'm not sure I fully understand your description of the problem, but
>> you might want to check out the JIRA ticket for making the replica
>> placement algorithms in HDFS pluggable
>> (https://issues.apache.org/jira/browse/HADOOP-3799) and add your use
>> case there.
>> Regards,
>> Jeff
>> On Tue, Feb 10, 2009 at 5:05 AM, Rasit OZDAS <rasitozdas@gmail.com> wrote:
>>> Hi,
>>> We have thousands of files, each dedicated to a user.  (Each user has
>>> access to other users' files, but they do this not very often.)
>>> Each user runs map-reduce jobs on the cluster.
>>> So we should seperate his/her files equally across the cluster,
>>> so that every machine can take part in the process (assuming he/she is
>>> the only user running jobs).
>>> For this we should initially copy files to specified nodes:
>>> User A :   first file : Node 1, second file: Node 2, .. etc.
>>> User B :   first file : Node 1, second file: Node 2, .. etc.
>>> I know, hadoop create also replicas, but in our solution at least one
>>> file will be in the right place
>>> (or we're willing to control other replicas too).
>>> Rebalancing is also not a problem, assuming it uses the information
>>> about how much a computer is in use.
>>> It even helps for a better organization of files.
>>> How can we copy files to specified nodes?
>>> Or do you have a better solution for us?
>>> I couldn't find a solution to this, probably such an option doesn't exist.
>>> But I wanted to take an expert's opinion about this.
>>> Thanks in advance..
>>> Rasit
> --
> M. Raşit ÖZDAŞ

M. Raşit ÖZDAŞ

View raw message