hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tapas Sarangi <tapas.sara...@gmail.com>
Subject Re: disk used percentage is not symmetric on datanodes (balancer)
Date Wed, 20 Mar 2013 04:12:11 GMT
Thanks for the reply. How can I assign a new value to the transfer speed for the balancer ?
Is this the parameter, dfs.balance.bandwidthPerSec ?

Where should this go, in conf/hdfs-site.xml ? or conf/core-site.xml  ?

-Tapas

 
On Mar 19, 2013, at 11:05 PM, Harsh J <harsh@cloudera.com> wrote:

> If your balancer does not exit, then it means its heavily working in
> iterations trying to balance your cluster. The default bandwidth
> allows only for limited transfer speed (10 Mbps) to not affect the
> cluster's RW performance while moving blocks between DNs for
> balancing, so the operation may be slow unless you raise the allowed
> bandwidth.
> 
> On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <tapas.sarangi@gmail.com> wrote:
>> Any more follow ups ?
>> 
>> Thanks
>> -Tapas
>> 
>> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <tapas.sarangi@gmail.com> wrote:
>> 
>>> 
>>> On Mar 18, 2013, at 11:50 PM, Harsh J <harsh@cloudera.com> wrote:
>>> 
>>>> What do you mean that the balancer is always active?
>>> 
>>> meaning, the same process is active for a long time. The process that starts
may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not
in effect because the process may never exit.
>>> 
>>> 
>>>> It is to be used
>>>> as a tool and it exits once it balances in a specific run (loops until
>>>> it does, but always exits at end). The balancer does balance based on
>>>> usage percentage so that is what you're probably looking for/missing.
>>>> 
>>> 
>>> May be. How does the balancer look for the usage percentage ?
>>> 
>>> -Tapas
>>> 
>>> 
>>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <tapas.sarangi@gmail.com>
wrote:
>>>>> Hi,
>>>>> 
>>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lhztop@hotmail.com> wrote:
>>>>> 
>>>>> Maybe you need to modify the rackware script to make the rack balance,
ie,
>>>>> all the racks are the same size,  on rack by 6 small nodes, one rack
by 1
>>>>> large nodes.
>>>>> P.S.
>>>>> you need to reboot the cluster for rackware script modify.
>>>>> 
>>>>> 
>>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered
rack
>>>>> awareness for the cluster, currently it is considered as just one rack.
Can
>>>>> that be the problem ? I don't know…
>>>>> 
>>>>> -Tapas
>>>>> 
>>>>> 
>>>>> 
>>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>> 
>>>>> And by active, it means that it does actually stops by itself? Else it
might
>>>>> mean that the throttling/limit might be an issue with regard to the data
>>>>> volume or velocity.
>>>>> 
>>>>> What threshold is used?
>>>>> 
>>>>> About the small and big datanodes, how are they distributed with regards
to
>>>>> racks?
>>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>> 
>>>>> Surely trivial questions again.
>>>>> 
>>>>> Bertrand
>>>>> 
>>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <tapas.sarangi@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>>> Yes, balancer is active and running on the namenode.
>>>>>> 
>>>>>> -Tapas
>>>>>> 
>>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <dechouxb@gmail.com>
wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> It is not explicitly said but did you use the balancer?
>>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> Bertrand
>>>>>> 
>>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am using one of the old legacy version (0.20) of hadoop for
our
>>>>>>> cluster. We have scheduled for an upgrade to the newer version
within a
>>>>>>> couple of months, but I would like to understand a couple of
things before
>>>>>>> moving towards the upgrade plan.
>>>>>>> 
>>>>>>> We have about 200 datanodes and some of them have larger storage
than
>>>>>>> others. The storage for the datanodes varies between 12 TB to
72 TB.
>>>>>>> 
>>>>>>> We found that the disk-used percentage is not symmetric through
all the
>>>>>>> datanodes. For larger storage nodes the percentage of disk-space
used is
>>>>>>> much lower than that of other nodes with smaller storage space.
In larger
>>>>>>> storage nodes the percentage of used disk space varies, but on
average about
>>>>>>> 30-50%. For the smaller storage nodes this number is as high
as 99.9%. Is
>>>>>>> this expected ? If so, then we are not using a lot of the disk
space
>>>>>>> effectively. Is this solved in a future release ?
>>>>>>> 
>>>>>>> If no, I would like to know  if there are any checks/debugs that
one can
>>>>>>> do to find an improvement with the current version or upgrading
hadoop
>>>>>>> should solve this problem.
>>>>>>> 
>>>>>>> I am happy to provide additional information if needed.
>>>>>>> 
>>>>>>> Thanks for any help.
>>>>>>> 
>>>>>>> -Tapas
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Bertrand Dechoux
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Mime
View raw message