Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: Evert Lammerts <Evert.Lammerts@sara.nl>
To: Abhishek Mehta <abhishek@tresata.com>, "general@hadoop.apache.org"
	<general@hadoop.apache.org>
Date: Thu, 30 Jun 2011 23:40:15 +0200
Subject: RE: Hadoop Java Versions
Thread-Topic: Hadoop Java Versions
Thread-Index: Acw3biTCPQz0yvRMRFOO80O3HF/alAAACfbg
Message-ID: <ADF94D8555C7A246B86A633685E0178AB90A35CA98@planck.ka.sara.nl>
References: 
 <ADF94D8555C7A246B86A633685E0178AB90A35CA96@planck.ka.sara.nl>,<93245D75-B596-4AC0-9733-BB2FF4E458E5@tresata.com>
In-Reply-To: <93245D75-B596-4AC0-9733-BB2FF4E458E5@tresata.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

That's not a question I'm qualified to answer. I do know we're now buying a=
n Arista for a different cluster, but there's probably loads others out the=
re.

*forwarded to general@...*

________________________________________
From: Abhishek Mehta [abhishek@tresata.com]
Sent: Thursday, June 30, 2011 11:38 PM
To: Evert Lammerts
Subject: Fwd: Hadoop Java Versions

what are the other switch options (other than cisco that is?)?

cheers


Abhishek Mehta
(e) abhishek@tresata.com<mailto:abhishek@tresata.com>
(v) 980.355.9855

Begin forwarded message:

From: Evert Lammerts <Evert.Lammerts@sara.nl<mailto:Evert.Lammerts@sara.nl>=
>
Date: June 30, 2011 5:31:26 PM EDT
To: "general@hadoop.apache.org<mailto:general@hadoop.apache.org>" <general@=
hadoop.apache.org<mailto:general@hadoop.apache.org>>
Subject: RE: Hadoop Java Versions
Reply-To: general@hadoop.apache.org<mailto:general@hadoop.apache.org>

You can get 12-24 TB in a server today, which means the loss of a server
generates a lot of traffic -which argues for 10 Gbe.

But
 -big increase in switch cost, especially if you (CoI warning) go with
Cisco
 -there have been problems with things like BIOS PXE and lights out
management on 10 Gbe -probably due to the NICs being things the BIOS
wasn't expecting and off the mainboard. This should improve.
 -I don't know how well linux works with ether that fast (field reports
useful)
 -the big threat is still ToR switch failure, as that will trigger a
re-replication of every block in the rack.

Keeping the amount of disks per node low and the amount of nodes high shoul=
d keep the impact of dead nodes in control. A ToR switch failing is differe=
nt - missing 30 nodes (~120TB) at once cannot be fixed by adding more nodes=
; that actually increases ToR switch failure. Although such failure is quit=
e rare to begin with, I guess. The back-of-the-envelope-calculation I made =
suggests that ~150 (1U) nodes should be fine with 1Gb ethernet. (e.g., when=
 6 nodes fail in a cluster with 150 nodes with four 2TB disks each, with HD=
FS 60% full, it takes around ~32 minutes to recover. 2 nodes failing should=
 take around 640 seconds. Also see the attached spreadsheet.) This doesn't =
take ToR switch failure in account though. On the other hand - 150 nodes is=
 only ~5 racks - in such a scenario you might rather want to shut the syste=
m down completely rather than letting it replicate 20% of all data.

Cheers,
Evert