Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 235456AF1 for ; Mon, 16 May 2011 19:46:19 +0000 (UTC) Received: (qmail 5946 invoked by uid 500); 16 May 2011 19:46:17 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 5888 invoked by uid 500); 16 May 2011 19:46:17 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 5879 invoked by uid 99); 16 May 2011 19:46:17 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 19:46:17 +0000 Received: from localhost (HELO awittena-mn.linkedin.biz) (127.0.0.1) (smtp-auth username aw, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 19:46:17 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: Acceptance tests From: Allen Wittenauer In-Reply-To: Date: Mon, 16 May 2011 12:46:16 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <3A2CF811-0C7E-4B05-8DB1-95A4D1FF8ACB@apache.org> References: To: X-Mailer: Apple Mail (2.1082) On May 16, 2011, at 11:03 AM, Evert Lammerts wrote: > Hi all, >=20 > What acceptance tests are people using when buying clusters for = Hadoop? Any pointers to relevant methods? We get some test nodes from various manufacturers. We do some = raw IO benchmarking vs. our other nodes. We add them to our various = grids to see how they perform real world, paying attn to avg task time = turn around for certain jobs. Since we know where our current machines = are at, we can look at price per perf improvements. Other random things that I think are important: a) Unless someone shares their entire *-site.xml data, = most published benchmarks on the net are mostly useless. Simple things = like block size have a big impact. b) Test your actual workload. Synthetic benchmarks are = just that--synthetic. They may not reflect that particular nuances of = your job. c) Establish a baseline. If you have no hardware today, = then at least establish something on EC2 to compare. d) Make sure you talk to multiple vendors. e) Any advice anyone gives you on config is likely going = to be wrong.=