Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1082)
Subject: Re: Acceptance tests
From: Allen Wittenauer <aw@apache.org>
In-Reply-To: <ADF94D8555C7A246B86A633685E0178AB90A35CA1B@planck.ka.sara.nl>
Date: Mon, 16 May 2011 12:46:16 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <3A2CF811-0C7E-4B05-8DB1-95A4D1FF8ACB@apache.org>
References: <ADF94D8555C7A246B86A633685E0178AB90A35CA1B@planck.ka.sara.nl>
To: <general@hadoop.apache.org>


On May 16, 2011, at 11:03 AM, Evert Lammerts wrote:

> Hi all,
>=20
> What acceptance tests are people using when buying clusters for =
Hadoop? Any pointers to relevant methods?


	We get some test nodes from various manufacturers.  We do some =
raw IO benchmarking vs. our other nodes.  We add them to our various =
grids to see how they perform real world, paying attn to avg task time =
turn around for certain jobs.   Since we know where our current machines =
are at, we can look at price per perf improvements.

	Other random things that I think are important:

		a) Unless someone shares their entire *-site.xml data, =
most published benchmarks on the net are mostly useless.  Simple things =
like block size have a big impact.

		b) Test your actual workload.  Synthetic benchmarks are =
just that--synthetic.  They may not reflect that particular nuances of =
your job.

		c) Establish a baseline. If you have no hardware today, =
then at least establish something on EC2 to compare.

		d) Make sure you talk to multiple vendors.

		e) Any advice anyone gives you on config is likely going =
to be wrong.=