hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: wrong region exception
Date Wed, 01 Jun 2011 22:34:04 GMT
We can't reach the server carrying .META. within 60 seconds.  Whats
going on on that server?  Doesn't the next time the below
catalogjanitor run, does it succeed or just always fail?


On Wed, Jun 1, 2011 at 2:27 PM, Robert Gonzalez
<Robert.Gonzalez@maxpointinteractive.com> wrote:
> This is basically it (for the first time it died while copying), we have it at warn level
and above:
> 2011-05-27 16:44:27,565 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Fail
> ed scan of catalog table
> java.net.SocketTimeoutException: Call to / failed on socket time
> out exception: java.net.SocketTimeoutException: 60000 millis timeout while waiti
> ng for channel to be ready for read. ch : java.nio.channels.SocketChannel[connec
> ted local=/ remote=/]
>        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.jav
> a:784)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257
> )
>        at $Proxy6.delete(Unknown Source)
>        at org.apache.hadoop.hbase.catalog.MetaEditor.deleteDaughterReferenceInP
> arent(MetaEditor.java:201)
>        at org.apache.hadoop.hbase.master.CatalogJanitor.removeDaughterFromParen
> t(CatalogJanitor.java:233)
>        at org.apache.hadoop.hbase.master.CatalogJanitor.hasReferences(CatalogJa
> nitor.java:275)
>        at org.apache.hadoop.hbase.master.CatalogJanitor.checkDaughter(CatalogJa
> nitor.java:202)
>        at org.apache.hadoop.hbase.master.CatalogJanitor.cleanParent(CatalogJani
> tor.java:166)
>        at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.jav
> a:120)
>        at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.ja
> va:85)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting f
> or channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
> local=/ remote=/]
>        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
> va:164)
>        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
> 55)
>        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
> 28)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.re
> ad(HBaseClient.java:281)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at java.io.DataInputStream.readInt(DataInputStream.java:370)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HB
> aseClient.java:521)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.ja
> va:459)
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Wednesday, June 01, 2011 12:29 PM
> To: user@hbase.apache.org
> Subject: Re: wrong region exception
> On Wed, Jun 1, 2011 at 7:32 AM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
>> We have a table copy program that copies the data from one table to another, and
we can give it the start/end keys.  In this case we created a new blank table with the essential
column families and let it run with start/end to be the whole range, 0-maxkey.  At about
30% of the way through, which is roughly 600 million rows, it died trying to write to the
new table with the wrong region exception.  When we tried to restart the copy from that key
+ some delta, it still crapped out.  No explanation in the logs the first time, but a series
of timeouts in the second run.  Now we are trying the copy again with a new table.
> Robert:
> Do you have the master logs for this copy run still?  If so, if you put them somewhere
where I can pull them (or send them to me, I'll
> take a look).   I'd like to see the logs in the cluster to which you
> were copying the data.
> St.Ack
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>> Stack
>> Sent: Tuesday, May 31, 2011 6:42 PM
>> To: user@hbase.apache.org
>> Subject: Re: wrong region exception
>> So, what about this new WrongRegionException in the new cluster.  Can you figure
how it came about?  In the new cluster, is there also a hole?  Did you start the new cluster
fresh or copy from old cluster?
>> St.Ack
>> On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
>>> Yeah, we learned the hard way early last year to follow the guidelines religiously.
 I've gone over the requirements and checked off everything.  We even re-did our tables
to only have 4 column families, down from 4x that amount.   We are at a loss to find out
why we seemed to be cursed when it comes to HBase.  Hadoop is performing like a charm, pretty
much every machine is busy 24/7.
>>> -----Original Message-----
>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>> Stack
>>> Sent: Tuesday, May 31, 2011 3:03 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: wrong region exception
>>> On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
>>>> Now I'm getting the wrong region exception on the new table that I'm copying
the old table to.  Running hbck reveals an inconsistency in the new table.  The frustration
is unbelievable.  Like I said before, it doesn't appear that HBase is ready for prime time.
 I don't know how companies are using this successfully, it doesn't appear plausible.
>>> Sorry you are not having a good experience.  I've not seen WrongRegionException
in ages (Grep these lists yourself).  Makes me suspect your environment.  For sure you've
read the requirements section in the manual and set up ulimits, nprocs and xceivers up?
>>> St.Ack

View raw message