Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 431E1100AB for ; Fri, 14 Mar 2014 19:29:22 +0000 (UTC) Received: (qmail 32023 invoked by uid 500); 14 Mar 2014 19:29:21 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 31921 invoked by uid 500); 14 Mar 2014 19:29:20 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 31904 invoked by uid 99); 14 Mar 2014 19:29:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2014 19:29:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mdrob@cloudera.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2014 19:29:14 +0000 Received: by mail-ob0-f176.google.com with SMTP id wp18so2986678obc.7 for ; Fri, 14 Mar 2014 12:28:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=r8W1s0eJXf1oNo11PWXX8FVK2TUsK2cPa2Q7B82viUg=; b=YSBSsX3avAP/nlfBzr7d7nMaVGx8L6D2z/V/PTG9DgcIpyfqPiBI6Z8gcRiLThlLFT pWeKbHxag+R4+E6GcRNwNq1OQJSXKsjDKm8N6qyYZcgcmeKqqmUj+AMCgCeizPqx3I4z /wjVqSfAQBhXZdVMymdaY0asIH9CiAaXh6MriSQ1yNdJfWc5hFJCa9aXdQGtB+jO9KAt 5RsMJ07AoE6/LNmdNjecfJfTvVPt+U+wXOBx8cC4CX2UmXDwb/Ckt7oV4mv5B747b2jN NwrV67j2FbRFhDauqPAKN21cHIfb54QyEd1UzIPmoDKD2d5nsS7PxsQmbDeBxNVAC5FG bMWg== X-Gm-Message-State: ALoCoQmxtKU/J88mOaDT9RwM1hUOvS/R/KDlBC5PoJ46ODrJI2Ep3pdtdFgzA4kBqHt6qSIRIz1z X-Received: by 10.60.173.233 with SMTP id bn9mr8399146oec.9.1394825334094; Fri, 14 Mar 2014 12:28:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.60.35.104 with HTTP; Fri, 14 Mar 2014 12:28:33 -0700 (PDT) In-Reply-To: References: From: Mike Drob Date: Fri, 14 Mar 2014 15:28:33 -0400 Message-ID: Subject: Re: HA namenode questions To: dev@accumulo.apache.org Content-Type: multipart/alternative; boundary=089e0118254490149304f496140c X-Virus-Checked: Checked by ClamAV on apache.org --089e0118254490149304f496140c Content-Type: text/plain; charset=ISO-8859-1 Specifically, for dealing with a large number of clients, you can use ZooKeeper Observers. ---------- Forwarded message ---------- From: Eric Newton Date: Fri, Mar 14, 2014 at 3:18 PM Subject: HA namenode questions To: dev@accumulo.apache.org For those of you running HA NN on large clusters, I'm looking for some advice. I was looking at an HA NN config today. Either by default, or by following the configuration instructions, I saw that the zookeeper timeout was set to 5 seconds. * is this a reasonable timeout? * do you provide HA NN its own set of zookeepers? We have seen problems with large GC pauses with tablet servers. This happens less and less as we have learned more tricks, but I'm constantly talking to users who want their zookeeper timeout as high as two minutes. We have also had to increase the number of zookeepers on our largest clusters in order to handle the "thundering herd" load when large map/reduce jobs kick off and they all start talking to accumulo, which requires reading information from zookeeper. Any experience you can share about HA NN configuration at scales over few hundred nodes would be appreciated. -Eric --089e0118254490149304f496140c--