hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: NotServingRegionException - Map/Reduce process fails
Date Thu, 23 Oct 2008 18:30:15 GMT
Find the MR task that failed.  Click through the UI to look at its 
logs.  It may have interesting info.  Its probably complaining about a 
region not being available (NSRE).  Figure which region it is.  Use the 
region historian or grep in the master logs -- 'grep -v metaScanner 
REGIONNAME' so you avoid the metaScanner noise -- to see if you can 
figure the regions history around the failure.  Look too at loading 
around failure time.  Were you swapping, etc. (Ganglia or some such 
helps here).

You might also test table is still wholesome -- that the MR job didn't 
damage the table.  A quick check that all regions are onlined and 
accessible is to scan for a column whose column family does exist but 
whose qualifier you know is not present: e.g. if you have columnfamily 
'page' and you know there is no column 'page:xyz', scan with that 
(Enable DEBUG in log4j so you can see regions being loaded as scan 
progresses): "scan 'TABLENAME', ['page:xyz']".

You might need to up the timeouts/retries.

Dru Jensen wrote:
> Hi hbase-users,
> During a fairly large MR process, on the Reduce cycle as its writing 
> its results to a table, I see 
> org.apache.hadoop.hbase.NotServingRegionException in the region server 
> log several times and then I see a split reporting it was successful.
> Eventually, the Reduce process fails with 
> org.apache.hadoop.hbase.client.RetriesExhaustedException after 10 
> failed attempts.
> What can I do to fix it?
> Thanks,
> Dru

View raw message