hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mat Hofschen <hofsc...@gmail.com>
Subject Re: RetriesExhaustedException for TableReduce
Date Wed, 18 Mar 2009 14:47:54 GMT
Have you monitored system statistics on the machine in question? On our
testcluster (33 nodes) 120 reduce jobs where trying to write into one
region. That machine showed 100% CPU and a lot of swapping. Basically we are
now making sure only to import into tables that are already well
distributed. We lowered the number of max reduce jobs to run, the memory per
java process (-xmx) and the size of regions (to 64mb).

Did you check the log files on the server that rejected the connections?
Perhaps if you turn debugging on you find out more?

Matthias

On Wed, Mar 18, 2009 at 1:26 PM, Yair Even-Zohar
<yaire@audiencescience.com>wrote:

> I believe it is number (2) below. I'm getting
> "RetriesExhaustedException" for exactly the same server region in all my
> reduce jobs.
>
> How did you get around this problem?
>
> Thanks
> -Yair
>
> -----Original Message-----
> From: Mat Hofschen [mailto:hofschen@gmail.com]
> Sent: Wednesday, March 18, 2009 11:38 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: RetriesExhaustedException for TableReduce
>
> Hi Yair,
> check the logs of the machine that refuses connection. I had two
> problems
> during large imports:
> 1. *"Too many open files*" see http://wiki.apache.org/hadoop/Hbase/FAQ
> (6)
> 2. Regions not distributed, heavy write access to one machine.
>
> Hope this helps,
> Matthias
>
> On Tue, Mar 17, 2009 at 11:19 PM, Yair Even-Zohar
> <yaire@audiencescience.com
> > wrote:
>
> > While loading a large amount of data to a non-empty table using
> > Tablereduce I get the error below.
> >
> > The first 1-3 reduces are usually successful, and then I get this
> > message.
> >
> >
> >
> > This error has occur when I'm using either 2 or 8 servers and
> regardless
> > on the number of reduces (4, 16 or 160). It did not occur when loading
> a
> > small amount of data (well, the first few reduces are successful
> > anyway).
> >
> >
> >
> > I googled
> "org.apache.hadoop.hbase.client.RetriesExhaustedException:"
> > without much help.
> >
> >
> >
> > Thanks
> >
> > -Yair
> >
> >
> >
> >
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> > contact region server 10.249.203.0:60020 for region
> > ase,RnpdOFZn-goAAADK-uMA,1237315693597, row 'T82JYnln-goAAACeGdMA',
> but
> > failed after 10 attempts.
> > Exceptions:
> > java.io.IOException: Call to /10.249.203.0:60020 failed on local
> > exception: java.io.EOFException
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > java.net.ConnectException: Call to /10.249.203.0:60020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> >
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegion
> > ServerWithRetries(HConnectionManager.java:841)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBa
> > tchOfRows(HConnectionManager.java:932)
> >        at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> >        at
> > org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
> >        at
> > org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
> >        at
> >
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> > (TableOutputFormat.java:73)
> >        at
> >
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write
> > (TableOutputFormat.java:53)
> >        at
> > org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:405)
> >        at
> >
> com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> > ClogUploader.java:223)
> >        at
> >
> com.revenuescience.audiencesearch.fba.ClogUploader$TableUploader.reduce(
> > ClogUploader.java:202)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:155)
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message