hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Can we replace namenode machine with some other machine ?
Date Thu, 22 Sep 2011 16:13:24 GMT

I agree w Steve except on one thing...

RAID 5 Bad. RAID 10 (1+0) good.

Sorry this goes back to my RDBMs days where RAID 5 will kill your performance and worse...



> Date: Thu, 22 Sep 2011 11:28:39 +0100
> From: stevel@apache.org
> To: common-user@hadoop.apache.org
> Subject: Re: Can we replace namenode machine with some other machine ?
> 
> On 22/09/11 05:42, praveenesh kumar wrote:
> > Hi all,
> >
> > Can we replace our namenode machine later with some other machine. ?
> > Actually I got a new  server machine in my cluster and now I want to make
> > this machine as my new namenode and jobtracker node ?
> > Also Does Namenode/JobTracker machine's configuration needs to be better
> > than datanodes/tasktracker's ??
> >
> 
> 1. I'd give it lots of RAM - holding data about many files, avoiding 
> swapping, etc.
> 
> 2. I'd make sure the disks are RAID5, with some NFS-mounted FS that the 
> secondary namenode can talk to. avoids risk of loss of the index, which, 
> if it happens, renders your filesystem worthless. If I was really 
> paranoid I'd have twin raid controllers with separate connections to 
> disk arrays in separate racks, as [Jiang2008] shows that interconnect 
> problems on disk arrays can be higher than HDD failures.
> 
> 3. if your central switches are at 10 GbE, consider getting a 10GbE NIC 
> and hooking it up directly -this stops the network being the bottleneck, 
> though it does mean the server can have a lot more packets hitting it, 
> so putting more load on it.
> 
> 4. Leave space for a second CPU and time for GC tuning.
> 
> 
> JT's are less important; they need RAM but use HDFS for storage. If your 
> cluster is small, NN and JT can be run locally. If you do this, set up 
> DNS to have two hostnames to point to same network address. Then if you 
> ever split them off, everyone whose bookmark says http://jobtracker 
> won't notice
> 
> Either way: the NN and the JT are the machines whose availability you 
> care about. The rest is just a source of statistics you can look at later.
> 
> -Steve
> 
> 
> 
> [Jiang2008] "Are disks the dominant contributor for storage failures?: A 
> comprehensive study of storage subsystem failure characteristics". ACM 
> Transactions on Storage.
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message