Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 66964 invoked from network); 23 Oct 2008 18:38:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Oct 2008 18:38:26 -0000 Received: (qmail 16877 invoked by uid 500); 23 Oct 2008 18:38:22 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 16855 invoked by uid 500); 23 Oct 2008 18:38:22 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 16791 invoked by uid 99); 23 Oct 2008 18:38:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Oct 2008 11:38:22 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 74.125.44.28 as permitted sender) Received: from [74.125.44.28] (HELO yx-out-2324.google.com) (74.125.44.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Oct 2008 18:37:11 +0000 Received: by yx-out-2324.google.com with SMTP id 31so153544yxl.29 for ; Thu, 23 Oct 2008 11:37:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type:references :x-google-sender-auth; bh=3uvqreVDTTOLv5pkoNrWKMxzkF1C1E7zVis8b58GzW4=; b=o3P2TfbT68VuiMRsN0sPlkEekPGm72HFLCNSvv0bMTvavDNlKIrlSVVEYaSgtOBaEc qSkLbyjQdDx8N+cTY3lDwVeMr0g0zBSgeEbT85BLE+eW4DIe6rft4tQU80DFX/RFk2J6 QkGWOKJh9wysmm5CRWJAG9fF6Je33MDc+mxR8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:references:x-google-sender-auth; b=u/s8A0j9BZ53lhCBslVsTr4z8PEDJvV5UKVEK8Rv8keg/tHp5LoQCcmD2JYDJlweUQ g8T7pFhjUGeCAA6XBDFyfxGSrbOE0tmExTIdhz1bvv+lT+23KAYWF9M0OsXvifYnBhH8 8xeZoGTsgFj2liqZxr9Qdu3sp1TCQOd+cRw+M= Received: by 10.142.214.11 with SMTP id m11mr515494wfg.69.1224787059212; Thu, 23 Oct 2008 11:37:39 -0700 (PDT) Received: by 10.142.101.2 with HTTP; Thu, 23 Oct 2008 11:37:39 -0700 (PDT) Message-ID: <31a243e70810231137x5d5b053cs35ec80a9fa2cb017@mail.gmail.com> Date: Thu, 23 Oct 2008 14:37:39 -0400 From: "Jean-Daniel Cryans" Sender: jdcryans@gmail.com To: hbase-user@hadoop.apache.org Subject: Re: NotServingRegionException - Map/Reduce process fails In-Reply-To: <4900C2B7.3010008@duboce.net> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_42464_25233458.1224787059214" References: <1FD66D81-F3FC-4F89-B0C8-D0B480872252@gmail.com> <4900C2B7.3010008@duboce.net> X-Google-Sender-Auth: 8863714dd912b101 X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_42464_25233458.1224787059214 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Dru. See also if it's a case of HBASE-921because it would make sense if not using hbase 0.18.1 and under a heavy load. J-D On Thu, Oct 23, 2008 at 2:30 PM, stack wrote: > Find the MR task that failed. Click through the UI to look at its logs. > It may have interesting info. Its probably complaining about a region not > being available (NSRE). Figure which region it is. Use the region > historian or grep in the master logs -- 'grep -v metaScanner REGIONNAME' so > you avoid the metaScanner noise -- to see if you can figure the regions > history around the failure. Look too at loading around failure time. Were > you swapping, etc. (Ganglia or some such helps here). > > You might also test table is still wholesome -- that the MR job didn't > damage the table. A quick check that all regions are onlined and accessible > is to scan for a column whose column family does exist but whose qualifier > you know is not present: e.g. if you have columnfamily 'page' and you know > there is no column 'page:xyz', scan with that (Enable DEBUG in log4j so you > can see regions being loaded as scan progresses): "scan 'TABLENAME', > ['page:xyz']". > > You might need to up the timeouts/retries. > St.Ack > > > > Dru Jensen wrote: > >> Hi hbase-users, >> >> During a fairly large MR process, on the Reduce cycle as its writing its >> results to a table, I see org.apache.hadoop.hbase.NotServingRegionException >> in the region server log several times and then I see a split reporting it >> was successful. >> >> Eventually, the Reduce process fails with >> org.apache.hadoop.hbase.client.RetriesExhaustedException after 10 failed >> attempts. >> >> What can I do to fix it? >> >> Thanks, >> Dru >> >> >> >> >> > ------=_Part_42464_25233458.1224787059214--