hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: wrong region exception
Date Thu, 02 Jun 2011 20:25:07 GMT
It can't get to c1-s19.  It timesout trying to connect.  Can you
figure whats up w/ that?  On always going to the same server, is this
a case of http://hbase.apache.org/book.html#timeseries?  Or perhaps,
regions split and go elsewhere but distcp is writing from the src in
order?
St.Ack

On Thu, Jun 2, 2011 at 12:49 PM, Robert Gonzalez
<Robert.Gonzalez@maxpointinteractive.com> wrote:
> First a clarification: everything is happening within the context of a single cluster
of 55 machines, there is no inter-cluster copying.  I restarted c1-s06, the regionserver
that died, and by the way, all new data seems to be going to this server first.  Is there
a reason for this?  This is from the beginning of the copy until it crashed, c1-s06 always
served the latest keys, no other server.  So after I restarted c1-s06, it keeps dying.  Here
is one of the crashes:
>
> 2011-06-02 13:29:07,546 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORT\
> ING region server serverName=c1-s06.atxd.maxpointinteractive.com,60020,1306860799744,
l\
> oad=(requests=0, regions=136, usedHeap=307, maxHeap=2999): Failed open of daughter urlh\
> ashv4,837743DFAE34D105BB5B1E81810627B8,1307039228513.d1c0eb0aef559f349fdaca3452f55c10.
> java.net.SocketTimeoutException: Call to c1-s19.atxd.maxpointinteractive.com/10.100.1.2\
> 6:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 mill\
> is timeout while waiting for channel to be ready for read. ch : java.nio.channels.Socke\
> tChannel[connected local=/10.100.1.7:41085 remote=c1-s19.atxd.maxpointinteractive.com/1\
> 0.100.1.26:60020]
>        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:784)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy8.put(Unknown Source)
>        at org.apache.hadoop.hbase.catalog.MetaEditor.addDaughter(MetaEditor.java:97)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegi\
> onServer.java:1350)
>        at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(Spl\
> itTransaction.java:328)
>        at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(Spl\
> itTransaction.java:296)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for chan\
> nel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.100.\
> 1.7:41085 remote=c1-s19.atxd.maxpointinteractive.com/10.100.1.26:60020]
>        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBas\
> eClient.java:281)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at java.io.DataInputStream.readInt(DataInputStream.java:370)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClie\
> nt.java:521)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)
> 2011-06-02 13:29:07,551 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORT\
> ING region server serverName=c1-s06.atxd.maxpointinteractive.com,60020,1306860799744,
l\
> oad=(requests=0, regions=136, usedHeap=307, maxHeap=2999): Failed open of daughter urlh\
> ashv4,836A60E0F78046975AD00B84CC0B71FB,1307039228513.be6575478b7f4c22d6540a60c7c45fbb.
> java.net.SocketTimeoutException: Call to c1-s19.atxd.maxpointinteractive.com/10.100.1.2\
> 6:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 mill\
> is timeout while waiting for channel to be ready for read. ch : java.nio.channels.Socke\
> tChannel[connected local=/10.100.1.7:41085 remote=c1-s19.atxd.maxpointinteractive.com/1\
> 0.100.1.26:60020]
>        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:784)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy8.put(Unknown Source)
>        at org.apache.hadoop.hbase.catalog.MetaEditor.addDaughter(MetaEditor.java:97)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegi\
> onServer.java:1350)
>        at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(Spl\
> itTransaction.java:328)
>        at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(Spl\
> itTransaction.java:296)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for chan\
> nel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.100.\
> 1.7:41085 remote=c1-s19.atxd.maxpointinteractive.com/10.100.1.26:60020]
>        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBas\
> eClient.java:281)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at java.io.DataInputStream.readInt(DataInputStream.java:370)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClie\
> nt.java:521)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)
>
>
>
>
>
> .....
>
>
>
> 2011-06-02 13:29:09,802 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file
\
> /hbase/urlhashv4/2f21ad9ef74c6e6d69259e9525f7b863/splits/0cf36594d8e349830d3b4f175b5b68\
> 8a/crawl/1293482649614989303.2f21ad9ef74c6e6d69259e9525f7b863 : java.io.IOException:
Er\
> ror Recovery for block blk_3518061384037278533_81210851 failed  because recovery from
p\
> rimary datanode 10.100.2.27:50010 failed 6 times.  Pipeline was 10.100.2.27:50010. Abor\
> ting...
> java.io.IOException: Error Recovery for block blk_3518061384037278533_81210851 failed
 \
> because recovery from primary datanode 10.100.2.27:50010 failed 6 times.  Pipeline was
\
> 10.100.2.27:50010. Aborting...
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCli\
> ent.java:2668)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:\
> 2139)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.\
> java:2306)
> 2011-06-02 13:29:09,819 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file
\
> /hbase/urlhashv4/2f21ad9ef74c6e6d69259e9525f7b863/splits/0cf36594d8e349830d3b4f175b5b68\
> 8a/flags/9107094927628682840.2f21ad9ef74c6e6d69259e9525f7b863 : java.io.IOException:
Er\
> ror Recovery for block blk_-2414361613802877611_81210865 failed  because recovery from
\
> primary datanode 10.100.2.9:50010 failed 6 times.  Pipeline was 10.100.2.9:50010. Abort\
> ing...
> java.io.IOException: Error Recovery for block blk_-2414361613802877611_81210865 failed
\
>  because recovery from primary datanode 10.100.2.9:50010 failed 6 times.  Pipeline
was \
> 10.100.2.9:50010. Aborting...
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCli\
> ent.java:2668)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:\
> 2139)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.\
> java:2306)
> 2011-06-02 13:29:09,820 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file
\
> /hbase/urlhashv4/2f21ad9ef74c6e6d69259e9525f7b863/splits/0cf36594d8e349830d3b4f175b5b68\
> 8a/url/6599763290219995724.2f21ad9ef74c6e6d69259e9525f7b863 : java.io.IOException: Erro\
> r Recovery for block blk_798371171266377251_81210865 failed  because recovery from prim\
> ary datanode 10.100.2.3:50010 failed 6 times.  Pipeline was 10.100.2.3:50010. Aborting.\
> ..
> java.io.IOException: Error Recovery for block blk_798371171266377251_81210865 failed
 b\
> ecause recovery from primary datanode 10.100.2.3:50010 failed 6 times.  Pipeline was
10\
> .100.2.3:50010. Aborting...
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCli\
> ent.java:2668)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:\
> 2139)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.\
> java:2306)
> 2011-06-02 13:29:09,820 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file
\
> /hbase/urlhashv4/2f21ad9ef74c6e6d69259e9525f7b863/splits/1e83e5a02f616d7437f10a6a636e26\
> 86/thumbs/2894023573319516345.2f21ad9ef74c6e6d69259e9525f7b863 : java.io.IOException:
B\
> ad connect ack with firstBadLink 10.100.2.11:50010
> java.io.IOException: Bad connect ack with firstBadLink 10.100.2.11:50010
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFS\
> Client.java:2963)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCl\
> ient.java:2888)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:\
> 2139)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.\
> java:2329)
> Thu Jun  2 13:31:07 CDT 2011 Starting regionserver on c1-s06
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Thursday, June 02, 2011 2:25 PM
> To: user@hbase.apache.org
> Subject: Re: wrong region exception
>
> So, cluster is OK after the below crash?  Regions come up fine on new servers and .META.
is fine?
>
> Below is interesting in that we failed a split because we could not write an edit to
the .META. (how many handlers are you running with?
> And what is going on on the .META. server at around this time?   Are
> you on 0.90.3 hbase?).  If we fail a split, we'll crash out the regionserver.  The
recover of the crashed regionserver should fixup the failed split so no holes in the .META.
 If this fixup did not run properly, this might be a cause of WrongRegionException.
>
> St.Ack
>
> On Thu, Jun 2, 2011 at 11:34 AM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
wrote:
>> And more info.  The copy dies on a regionserver failure.  Here is the exception
when it dies:
>>
>> 2011-06-02 13:29:07,546 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer
>> : ABORTING region server
>> serverName=c1-s06.atxd.maxpointinteractive.com,60020,13
>> 06860799744, load=(requests=0, regions=136, usedHeap=307,
>> maxHeap=2999): Failed open of daughter
>> urlhashv4,837743DFAE34D105BB5B1E81810627B8,1307039228513.d1c0eb
>> 0aef559f349fdaca3452f55c10.
>> java.net.SocketTimeoutException: Call to c1-s19.atxd.maxpointinteractive.com/10.
>> 100.1.26:60020 failed on socket timeout exception:
>> java.net.SocketTimeoutExcepti
>> on: 60000 millis timeout while waiting for channel to be ready for
>> read. ch : ja va.nio.channels.SocketChannel[connected
>> local=/10.100.1.7:41085 remote=c1-s19.at
>> xd.maxpointinteractive.com/10.100.1.26:60020]
>>        at
>> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.jav
>> a:784)
>>        at
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>>        at
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257
>> )
>>        at $Proxy8.put(Unknown Source)
>>        at
>> org.apache.hadoop.hbase.catalog.MetaEditor.addDaughter(MetaEditor.jav
>> a:97)
>>        at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTask
>> s(HRegionServer.java:1350)
>>        at
>> org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterReg
>> ion(SplitTransaction.java:328)
>>        at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.
>> run(SplitTransaction.java:296)
>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting f or channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected
>> local=/10.100.1.7:41085
>> remote=c1-s19.atxd.maxpointinteractive.com/10.100.1.26:6
>> 0020]
>>        at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
>> va:164)
>>        at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
>> 55)
>>        at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
>> 28)
>>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>>        at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.re
>> ad(HBaseClient.java:281)
>>        at
>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>> -
>>
>> -----Original Message-----
>> From: Robert Gonzalez [mailto:Robert.Gonzalez@maxpointinteractive.com]
>> Sent: Thursday, June 02, 2011 12:29 PM
>> To: 'user@hbase.apache.org'
>> Subject: RE: wrong region exception
>>
>> Here's another clue:  the process is taking up lots of cpu time, likes its in some
kind of loop, but the output indicates that its stuck on the same section.
>>
>> Robert
>>
>>
>> -----Original Message-----
>> From: Robert Gonzalez [mailto:Robert.Gonzalez@maxpointinteractive.com]
>> Sent: Thursday, June 02, 2011 12:07 PM
>> To: 'user@hbase.apache.org'
>> Subject: RE: wrong region exception
>>
>> Also, notice the output of my copy, where its now stuck on the final line.  first
column is number of rows, second column is key value:
>> total:904600000 7FECD7A2D11FFD850FDC7CA899CA3138
>> total:904700000 7FF0787C8EC28FF760BF0E38BB1F95C8
>> total:904800000 7FF418DDFCB134EFA7F1304762EA4A20
>> total:904900000 7FF7B7BC506DC77272DC9CBAE27DDD2D
>> total:905000000 7FFB5E24CC30B1FF8A9AE73068EFDB0B
>> total:905100000 7FFF0085ECE908C208BA083A99C05E42
>> total:905200000 8002A1540C309D99F587DAA712167091
>> total:905300000 800644A00B083B496A07B0633A51B528
>> total:905400000 8009E8A2EAC96846405476D294FDD999
>> total:905500000 800D8E0D6DB9259F16775B1080AE6968
>> total:905600000 80112E5D7E36AFB915DE4906BFF9F41C
>>
>> But in the hbase web page for the table urlhashv4 (the one we are copying into) it
only got this far.
>>
>> urlhashv4,7FC19684831DF6E8ACCE0E690EF5BCAB,1306993100934.63e52326bc86f
>> 332a2f38056d934cbf3.       c1-s06.atxd.maxpointinteractive.com:60030
>> 7FC19684831DF6E8ACCE0E690EF5BCAB
>> 7FD3A81AD94CD99BEA6B4DA485BDDEBE
>> urlhashv4,7FD3A81AD94CD99BEA6B4DA485BDDEBE,1306993171155.0296cf0214b2d
>> 7dfe8ad8adac3ad7bf5.      c1-s06.atxd.maxpointinteractive.com:60030
>> 7FD3A81AD94CD99BEA6B4DA485BDDEBE
>> 7FE65457BB3A9492B6A0437124D6F5C7
>> urlhashv4,7FE65457BB3A9492B6A0437124D6F5C7,1306993227676.c038dcb619eab
>> bd6d862634b83ba412e.      c1-s06.atxd.maxpointinteractive.com:60030
>> 7FE65457BB3A9492B6A0437124D6F5C7
>> 7FF5A0C88779F27177F1E5E8159680BE
>> urlhashv4,7FF5A0C88779F27177F1E5E8159680BE,1306993227676.5348549ea1080
>> dca60d2e043da973258.      c1-s06.atxd.maxpointinteractive.com:60030
>> 7FF5A0C88779F27177F1E5E8159680BE
>>
>>
>> That, in conjunction with the messages on the slave that is trying to insert the
data, indicates to me that its about to get into the same wrong region exception situation
again.
>>
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>> Stack
>> Sent: Wednesday, June 01, 2011 5:34 PM
>> To: user@hbase.apache.org
>> Subject: Re: wrong region exception
>>
>> We can't reach the server carrying .META. within 60 seconds.  Whats going on on
that server?  Doesn't the next time the below catalogjanitor run, does it succeed or just
always fail?
>>
>> St.Ack
>>
>> On Wed, Jun 1, 2011 at 2:27 PM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
wrote:
>>> This is basically it (for the first time it died while copying), we have it at
warn level and above:
>>>
>>> 2011-05-27 16:44:27,565 WARN
>>> org.apache.hadoop.hbase.master.CatalogJanitor: Fail ed scan of
>>> catalog table
>>> java.net.SocketTimeoutException: Call to /10.100.2.6:60020 failed on
>>> socket time out exception: java.net.SocketTimeoutException: 60000
>>> millis timeout while waiti ng for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connec
>>> ted local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
>>>        at
>>> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.jav
>>> a:784)
>>>        at
>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>>>        at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257
>>> )
>>>        at $Proxy6.delete(Unknown Source)
>>>        at
>>> org.apache.hadoop.hbase.catalog.MetaEditor.deleteDaughterReferenceInP
>>> arent(MetaEditor.java:201)
>>>        at
>>> org.apache.hadoop.hbase.master.CatalogJanitor.removeDaughterFromParen
>>> t(CatalogJanitor.java:233)
>>>        at
>>> org.apache.hadoop.hbase.master.CatalogJanitor.hasReferences(CatalogJa
>>> nitor.java:275)
>>>        at
>>> org.apache.hadoop.hbase.master.CatalogJanitor.checkDaughter(CatalogJa
>>> nitor.java:202)
>>>        at
>>> org.apache.hadoop.hbase.master.CatalogJanitor.cleanParent(CatalogJani
>>> tor.java:166)
>>>        at
>>> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.jav
>>> a:120)
>>>        at
>>> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.ja
>>> va:85)
>>>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout
>>> while waiting f or channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
>>>        at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
>>> va:164)
>>>        at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
>>> 55)
>>>        at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
>>> 28)
>>>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>>>        at
>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.re
>>> ad(HBaseClient.java:281)
>>>        at
>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>        at
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>>>        at java.io.DataInputStream.readInt(DataInputStream.java:370)
>>>        at
>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HB
>>> aseClient.java:521)
>>>        at
>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.ja
>>> va:459)
>>>
>>> -----Original Message-----
>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>> Stack
>>> Sent: Wednesday, June 01, 2011 12:29 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: wrong region exception
>>>
>>> On Wed, Jun 1, 2011 at 7:32 AM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
wrote:
>>>> We have a table copy program that copies the data from one table to another,
and we can give it the start/end keys.  In this case we created a new blank table with the
essential column families and let it run with start/end to be the whole range, 0-maxkey.  At
about 30% of the way through, which is roughly 600 million rows, it died trying to write to
the new table with the wrong region exception.  When we tried to restart the copy from that
key + some delta, it still crapped out.  No explanation in the logs the first time, but a
series of timeouts in the second run.  Now we are trying the copy again with a new table.
>>>>
>>>
>>> Robert:
>>>
>>> Do you have the master logs for this copy run still?  If so, if you
>>> put them somewhere where I can pull them (or send them to me, I'll
>>> take a look).   I'd like to see the logs in the cluster to which you were copying
the data.
>>>
>>> St.Ack
>>>
>>>
>>>> -----Original Message-----
>>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>>> Stack
>>>> Sent: Tuesday, May 31, 2011 6:42 PM
>>>> To: user@hbase.apache.org
>>>> Subject: Re: wrong region exception
>>>>
>>>> So, what about this new WrongRegionException in the new cluster.  Can you
figure how it came about?  In the new cluster, is there also a hole?  Did you start the
new cluster fresh or copy from old cluster?
>>>>
>>>> St.Ack
>>>>
>>>> On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
wrote:
>>>>> Yeah, we learned the hard way early last year to follow the guidelines
religiously.  I've gone over the requirements and checked off everything.  We even re-did
our tables to only have 4 column families, down from 4x that amount.   We are at a loss to
find out why we seemed to be cursed when it comes to HBase.  Hadoop is performing like a
charm, pretty much every machine is busy 24/7.
>>>>>
>>>>> -----Original Message-----
>>>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>>>> Stack
>>>>> Sent: Tuesday, May 31, 2011 3:03 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: wrong region exception
>>>>>
>>>>> On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez <Robert.Gonzalez@maxpointinteractive.com>
wrote:
>>>>>> Now I'm getting the wrong region exception on the new table that
I'm copying the old table to.  Running hbck reveals an inconsistency in the new table.  The
frustration is unbelievable.  Like I said before, it doesn't appear that HBase is ready for
prime time.  I don't know how companies are using this successfully, it doesn't appear plausible.
>>>>>>
>>>>>
>>>>>
>>>>> Sorry you are not having a good experience.  I've not seen WrongRegionException
in ages (Grep these lists yourself).  Makes me suspect your environment.  For sure you've
read the requirements section in the manual and set up ulimits, nprocs and xceivers up?
>>>>>
>>>>> St.Ack
>>>>>
>>>>
>>>
>>
>

Mime
View raw message