hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patai Sangbutsarakum <silvianhad...@gmail.com>
Subject toward Rack-Awareness approach
Date Wed, 29 Feb 2012 02:12:26 GMT
Hi Hadoopers,

Currently I am running hadoop version 0.20.203 in production with 600 TB in her.
I am planning to enable rack awareness in my production, but I still
didn't see it through.


1. I have script that can solve datanode/tasktracker IP to rack name.
2. Add topology.script.file.name in hdfs-site.xml and restart cluster.
3. After the cluster come back, my question start here,
    - do i have to run balancer or fsck or some command to have those
600 TB become redistribute to different rack in one time ?
    - currently i run balancer 2 hrs. everyday, can i keep this
routine and hope that at some point the data will be nicely
redistributed and aware of rack location ?
    - how could we know that the data in the cluster is now fully rack
awareness ??
    - if i just add the script and run balancer 2 hrs everyday, before
the whole data become rack awareness. the data will be kind
      of mix between "default-rack" of existing data (haven't get
balanced) and probably new loaded data will be rack-awareness.
      is it OK ? to have mix of default-rack and rack-specific data together ?

4. thought ?

Hope this make sense,

Thanks in advance

View raw message