Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BFC9210C82 for ; Fri, 22 Nov 2013 15:14:46 +0000 (UTC) Received: (qmail 64322 invoked by uid 500); 22 Nov 2013 15:14:44 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 64196 invoked by uid 500); 22 Nov 2013 15:14:43 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 64027 invoked by uid 99); 22 Nov 2013 15:14:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 15:14:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ogdude@googlemail.com designates 209.85.160.45 as permitted sender) Received: from [209.85.160.45] (HELO mail-pb0-f45.google.com) (209.85.160.45) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 15:14:32 +0000 Received: by mail-pb0-f45.google.com with SMTP id rp16so1425042pbb.4 for ; Fri, 22 Nov 2013 07:14:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=w9nBMMt0jTe5a7EfqkN1kiO50cXlb/4Ec1+PdeMVkbw=; b=ls0lH/vvK8ligYoYi5VfD+YpyNUm+kMWT73HonHotCdHGOwMiwWkji9gYTy/I/FdgL pvTJBGorK2invXuuY866ziVzGXAB/hxTWLGNl/MMzegO8Q09qV09f8vfGzvPMGWd2oYh +IUktiYUSVje2U3lzFMnDWlQrAiw8uFH4/fuqsMpvo7LrF4OdCo0lskF+rsrlvFV093x t1UrtudDAMkDA8SWglPYlLd45cljcxOxkkwW2yjBIOTqKIFwi7LnFt7w+eJh6z3Xss2i tdtmg/OHI30MAzefO/vXWtR9e1UJ2stOWMFLtekS9KwPqK477+CLxwsHpU9lmhd8udvg sU6w== MIME-Version: 1.0 X-Received: by 10.66.250.129 with SMTP id zc1mr12750981pac.153.1385133252643; Fri, 22 Nov 2013 07:14:12 -0800 (PST) Received: by 10.70.22.4 with HTTP; Fri, 22 Nov 2013 07:14:12 -0800 (PST) Date: Fri, 22 Nov 2013 16:14:12 +0100 Message-ID: Subject: One Region Server fails - all M/R jobs crash. From: David Koch To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7b15b2657d995404ebc57724 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b15b2657d995404ebc57724 Content-Type: text/plain; charset=ISO-8859-1 Hello, We experience reliability problems when running M/R jobs over HBase tables. Specifically, it suffices for one Region Server to crash in order to fail all M/R jobs. My guess is that this is not normal with a replication factor of 3. The HBase version is 0.94.6 installed as part of of Cloudera 4.4. HBase settings are pre-sets. Cluster size is 30 machines. What steps can I follow to improve the situation? Thank you, /David --047d7b15b2657d995404ebc57724--