hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ondřej Klimpera <klimp...@fit.cvut.cz>
Subject Re: Dealing with low space cluster
Date Thu, 14 Jun 2012 14:16:22 GMT
Thanks, I'll try.

One more question, I've got few more nodes, which can be added to the 
cluster. But how to do that?

If I understand it (according to Hadoop's wiki pages):

1. On master node - edit slaves file and add IP addresses of new nodes 
(everything clear)
2. log in to each newly added node and run (it's clear to me too)

$ hadoop-daemon.sh start datanode
$ hadoop-daemon.sh start tasktracker

3. Now I'm not sure, I'm not using dfs.include/mapred.include, do I have 
to run commands:

$ hadoop dfsadmin -refreshNodes
$ hadoop mradmin -refreshNodes

If yes, must it be run on master node, or new slaves nodes?


On 06/14/2012 04:03 PM, Harsh J wrote:
> Ondřej,
> That isn't currently possible with local storage FS. Your 1 TB NFS
> point can help but I suspect it may act as a slow-down point if nodes
> use it in parallel. Perhaps mount it only on 3-4 machines (or less),
> instead of all, to avoid that?
> On Thu, Jun 14, 2012 at 7:28 PM, Ondřej Klimpera<klimpond@fit.cvut.cz>  wrote:
>> Hello,
>> you're right. That's exactly what I ment. And your answer is exactly what I
>> thought. I was just wondering if Hadoop can distribute the data to other
>> node's local storages if own local space is full.
>> Thanks
>> On 06/14/2012 03:38 PM, Harsh J wrote:
>>> Ondřej,
>>> If by processing you mean trying to write out (map outputs)>    20 GB of
>>> data per map task, that may not be possible, as the outputs need to be
>>> materialized and the disk space is the constraint there.
>>> Or did I not understand you correctly (in thinking you are asking
>>> about MapReduce)? Cause you otherwise have ~50 GB space available for
>>> HDFS consumption (assuming replication = 3 for proper reliability).
>>> On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera<klimpond@fit.cvut.cz>
>>>   wrote:
>>>> Hello,
>>>> we're testing application on 8 nodes, where each node has 20GB of local
>>>> storage available. What we are trying to achieve is to get more than 20GB
>>>> to
>>>> be processed on this cluster.
>>>> Is there a way how to distribute the data on the cluster?
>>>> There is also one shared NFS storage disk with 1TB of available space,
>>>> which
>>>> is now unused.
>>>> Thanks for your reply.
>>>> Ondrej Klimpera

View raw message