hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Buchanan <John.Bucha...@infinitecampus.com>
Subject Centralized configuration management
Date Tue, 15 Feb 2011 14:51:11 GMT
Allen,

I wonder if you could discuss more what you meant by (or what you use)
configuration management?  Doing some initial research I'm finding quite a
few options for centralized configuration management, both open source and
commercial.  Would love to hear what others are using.

Thanks,

-John




On 2/8/11 11:25 AM, "Allen Wittenauer" <awittenauer@linkedin.com> wrote:

>
>On Feb 8, 2011, at 7:20 AM, John Buchanan wrote:
>> What we were thinking for our first deployment was 10 HP DL385's each
>>with
>> 8 2TB SATA drives.  First pair in Raid1 for the system drive, the
>> remaining each containing a distinct partition and mount point, then
>> specified in hdfs-site.xml in comma-delimited fashion.  Seems to make
>>more
>> sense to use Raid at least for the system drives so the loss of 1 drive
>> won't take down the entire node.  Granted data integrity wouldn't be
>> affected but how much time do you want to spend rebuilding an entire
>>node
>> due to the loss of one drive.  Considered using a smaller pair for the
>> system drives but if they're all the same then we only need to stock one
>> type of spare drive.
>
>
>    Don't bother RAID'ing the system drive.  Seriously.  You're giving up
>performance for something that rarely happens.  If you have decent
>configuration management, rebuilding a node is not a big deal and doesn't
>take that long anyway.
>
>    Besides, losing one of the JBOD disks will likely bring the node down
>anyway.
>
>> Another question I have is whether using 1TB drives would be advisable
>> over 2TB for the purpose of reducing rebuild time.
>
>    You're over thinking the rebuild time.  Again, configuration
>management makes this a non-issue.
>
>
>> Or perhaps I'm still
>> thinking of this as I would a Raid volume.  If we needed to rebalance
>> across the cluster would the time needed be more dependent on the amount
>> of data involved and the connectivity between nodes?
>
>    Yes.
>
>    When a node goes down, the data and tasks are automatically moved.
>So a node can be down for as long as it needs to be down.  The grid will
>still be functional.  So don't panic if a compute node goes down. :)
>
>


Mime
View raw message