hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Tokhy <aaron.to...@resonatenetworks.com>
Subject Re: Hadoop Datacenter Setup
Date Tue, 31 Jan 2012 16:33:58 GMT
Hey Patrick, thanks for your reply.
> I'm still skeptical when it comes to flash drives, especially as pertains
> to Hadoop. The write cycle limit is impractical to make them usable for
> dfs.data.dir and mapred.local.dir, and as you pointed out, you can't use
> them for logs either.
I was going to use the USB drives as a persistent store for the root 
filesystem.  Hadoop will be doing all of its work on HDFS, which will be 
stored on /mnt/d0,/mnt/d1,/mnt/d2.  The system will not be used for any 
other purpose, so its highly unlikely there will be writes to the USB 
drive other than the occasional updates to Hadoop cluster configuration, 
and any system logs that may rotate over time.
> If you put HADOOP_LOG_DIR in /mnt/d0, you will still have to shut down the
> TT and DN in order to replace the drive. So you may as well just carve out
> 100GB from that drive and put your root filesystem there.
The setup here may involve a RAID 1 of 100GB partitions.  This was 
another option I was thinking about which would allow for hot swapping 
drives while the node is alive.  But this needs some handiwork with mdadm.
> I'd say that unless you're running some extremely CPU-heavy workloads, you
> should consider getting more than 3 drives per node. Most shops get 6-12
> drives per node (with dual quad or hex core processors). Then you can
> sacrifice one of the drives for swap and the OS.

We have 3 drives per node due to having 4 blades per 2U.  Its a physical 
datacenter constraint given our requirements.

> I'd keep the RegionServer heap at 12GB or under to mitigate long GC pauses
> (the bigger the heap, the longer the eventual full GC).
Agreed.
> Finally, you can run Hive on the same cluster as HBase, just be wary of
> load spikes due to MR jobs and configure properly. You don't want a large
> Hive query to knock out your RegionServers thereby causing cascading
> failures.
We were thinking about another cluster that would just run Hive jobs.  
We do not have that flexibility at the moment.


On 01/30/2012 09:17 PM, Patrick Angeles wrote:
> Hey Aaron,
>
> I'm still skeptical when it comes to flash drives, especially as pertains
> to Hadoop. The write cycle limit is impractical to make them usable for
> dfs.data.dir and mapred.local.dir, and as you pointed out, you can't use
> them for logs either.
>
> If you put HADOOP_LOG_DIR in /mnt/d0, you will still have to shut down the
> TT and DN in order to replace the drive. So you may as well just carve out
> 100GB from that drive and put your root filesystem there.
>
> I'd say that unless you're running some extremely CPU-heavy workloads, you
> should consider getting more than 3 drives per node. Most shops get 6-12
> drives per node (with dual quad or hex core processors). Then you can
> sacrifice one of the drives for swap and the OS.
>
> I'd keep the RegionServer heap at 12GB or under to mitigate long GC pauses
> (the bigger the heap, the longer the eventual full GC).
>
> Finally, you can run Hive on the same cluster as HBase, just be wary of
> load spikes due to MR jobs and configure properly. You don't want a large
> Hive query to knock out your RegionServers thereby causing cascading
> failures.
>
> - P
>
> On Mon, Jan 30, 2012 at 6:44 PM, Aaron Tokhy<
> aaron.tokhy@resonatenetworks.com>  wrote:
>
>> I forgot to add:
>>
>> Are there use cases for using a swap partition for Hadoop nodes if our
>> combined planned heap size is not expected to go over 24GB for any
>> particular node type?  I've noticed that if HBase starts to GC, it will
>> pause for unreasonable amounts of time if old pages get swapped to disk,
>> causing the regionserver to crash (which we've mitigated by setting
>> vm.swappiness=5).
>>
>> Our slave node template will have a 1 GB heap Task Tracker, a 1 GB heap
>> Data Node and a 12-16GB heap RegionServer.  We assume the OS memory
>> overhead is 1 GB.  We added another 1 GB for combined Java VM overhead
>> across services, which comes up to be around a max of 16-20GB used.  This
>> gives us around 4-8GB for tasks that would work with HBase.  We may also
>> use Hive on the same cluster for queries.
>>
>>
>>
>> On 01/30/2012 05:40 PM, Aaron Tokhy wrote:
>>
>>> Hi,
>>>
>>> Our group is trying to set up a prototype for what will eventually
>>> become a cluster of ~50 nodes.
>>>
>>> Anyone have experiences with a stateless Hadoop cluster setup using this
>>> method on CentOS?  Are there any caveats with a read-only root file
>>> system approach?  This would save us from having to keep a root volume
>>> on every system (whether it is installed on a USB thumb drive, or a RAID
>>> 1 of bootable / partitions).
>>>
>>> http://citethisbook.net/Red_**Hat_Introduction_to_Stateless_**Linux.html<http://citethisbook.net/Red_Hat_Introduction_to_Stateless_Linux.html>
>>>
>>> We would like to keep the OS root file system separate from the Hadoop
>>> filesystem(s) for maintenance reasons (we can hot swap disks while the
>>> system is running)
>>>
>>> We were also considering installing the root filesystem on USB flash
>>> drives, making it persistent yet separate.  However we would identify
>>> and turn off anything that would cause excess writes to the root
>>> filesystem given the limited number of USB flash drive write cycles
>>> (keep IO writes to the root filesystem to a minimum).  We would do this
>>> by storing the Hadoop logs on the same filesystem/drive as what we
>>> specify in dfs.data.dir/dfs.name.dir.
>>>
>>> In the end we would have something like this:
>>>
>>> USB (MS DOS partition table + 1 ext2/3/4 partition)
>>> /dev/sda
>>> /dev/sda1    mounted as /        (possibly read-only)
>>> /dev/sda2    mounted as /var    (read-write)
>>> /dev/sda3    mounted as /tmp    (read-write)
>>>
>>> Hadoop Disks (no partition table or GPT since these are 3TB disks)
>>> /dev/sdb    /mnt/d0
>>> /dev/sdc    /mnt/d1
>>> /dev/sdd    /mnt/d2
>>>
>>> /mnt/d0 would contain all Hadoop logs.
>>>
>>> Hadoop configuration files would still reside on /
>>>
>>>
>>> Any issues with such a setup?  Are there better ways of achieving this?
>>>

Mime
View raw message