Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFAAF3570 for ; Mon, 2 May 2011 19:56:37 +0000 (UTC) Received: (qmail 94193 invoked by uid 500); 2 May 2011 19:56:36 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 94170 invoked by uid 500); 2 May 2011 19:56:36 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 94162 invoked by uid 99); 2 May 2011 19:56:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 19:56:36 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of magnito@gmail.com designates 209.85.160.169 as permitted sender) Received: from [209.85.160.169] (HELO mail-gy0-f169.google.com) (209.85.160.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 19:56:29 +0000 Received: by gyd8 with SMTP id 8so2863827gyd.14 for ; Mon, 02 May 2011 12:56:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=PLEsJA5VUyRf9pr5X3bsjOmP75NK53t1TdTqZTZ7nss=; b=B7xFVp4ajKLhmFedXtm7cu4nhTd27y8Se6762H549BfIxhd868PQgQMOEEM6BbTbt+ QF01D/0ALM5kZ334xLJ7+rauLRxmq/o5fYT07BXkJvQ2ObigwDj6JPHLMcyoILJMjuq1 KxdJbFS8KVdn63/C64ag6o4eZvMD3aUaQeBuI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=xtpF6NWlblIHELOHxaaU5hBCFZa8OFICTWJGpOP7Ro9UVDyUp0FOTO6+CaPMhyb4YQ oN00aD1zYUr39qZAa3gqqWIM73ADbC+0CSI6ydX9yth6gZ7U5/MQOyXa8BgfY9ereAYG 1GJ4UeE+2sQc523zqfDb2fzdWpHT/+LLonswY= MIME-Version: 1.0 Received: by 10.236.168.40 with SMTP id j28mr1184821yhl.368.1304366168746; Mon, 02 May 2011 12:56:08 -0700 (PDT) Received: by 10.236.110.164 with HTTP; Mon, 2 May 2011 12:56:08 -0700 (PDT) In-Reply-To: References: Date: Mon, 2 May 2011 12:56:08 -0700 Message-ID: Subject: Re: one of our datanodes stops working after few hours From: Jack Levin To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Tried removing yourkit and run on javasun, same thing. We have some threads blocked, does anyone know what they block on? -Jack On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon wrote: > Hi Jack, > > Does this happen even if you aren't running Yourkit on the DN? > > Can you try using a Sun JDK instead of OpenJDK? > > -Todd > > On Sun, May 1, 2011 at 7:34 PM, Jack Levin wrote: > >> Version: =A0 =A0 =A0 =A0 0.20.2+320 hdfs >> .89 HBASE >> >> ulimit is 32k >> xcievers is 5k >> >> Note from the jstack, I am not exceeding xcievers. >> >> -Jack >> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel >> wrote: >> > >> > >> > What's your xceivers set to? >> > What's the ulimit -n =A0set for hdfs/hadoop user... (You didn't say wh= ich >> release/version you were using.) >> > >> >> Date: Sun, 1 May 2011 17:47:18 -0700 >> >> Subject: one of our datanodes stops working after few hours >> >> From: magnito@gmail.com >> >> To: user@hbase.apache.org >> >> >> >> I took a jstack (http://pastebin.com/5v6mHg3t). =A0 After few hours, = its >> >> literally staggers to a halt and gets very very slow... Any ideas >> >> whats its blocking on? >> >> (main issue is that fsreads for RS get really slow when that happens)= . >> >> >> >> -Jack >> > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >