hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: WrongRegionException - How do I fix it?
Date Fri, 04 Jan 2008 22:39:13 GMT
If possible, please move to TRUNK instead.  Most of the below have been 
addressed there (I can send you a patch if you want to run hbase TRUNK 
on hadoop 0.15.x).

Further comments inline below:


Lars George wrote:
> Hi Stack,
>
> Yes, it happens every time I insert particular rows. Before it would 
> fail every now and so often, but since now all "good" rows are 
> inserted I am stuck with the ones that do not insert. And I am sure 
> they did once, with no error. So they are in there in limbo, but I 
> cannot retrieve nor delete or insert them.
>
> It mentions in the FAQ that I can switch on debugging through the UI, 
> but I cannot see where. I am using version 0.15.1, is that supposed to 
> have that option or do I need to go the log4j.properties plus restart 
> route?

This is a post-0.15 release feature (It says post-0.15.x in the FAQ).


> I have errors all the  time - which quite frankly worry me. Here a 
> list of what I see so far:
>
> 1. At startup
>
> ==> 
> /usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-40.worldlingo.com.log 
> <==
> 2008-01-03 14:11:22,512 WARN org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using 
> builtin-java classes where applicable
> 2008-01-03 14:11:29,808 WARN org.apache.hadoop.hbase.HRegionServer: 
> Processing message (Retry: 0)
> java.io.IOException: java.io.IOException: 
> java.util.ConcurrentModificationException 
....


Fixed in TRUNK


>
>
> 2. Sporadically
>
> 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: 
> Unexpected error trying to delete block blk_-8931657506153335343. 
> Block not found in blockMap.
> 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: 
> Unexpected error trying to delete block blk_3775459202881005176. Block 
> not found in blockMap.
> 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: 
> Unexpected error trying to delete block blk_-283089329129695997. Block 
> not found in blockMap.
> 2008-01-03 21:32:00,644 WARN org.apache.hadoop.dfs.DataNode: 
> java.io.IOException: Error in deleting blocks.
>        at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:719)
>        at 
> org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:625)
>        at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:528)
>        at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1494)
>        at java.lang.Thread.run(Thread.java:595)
>
> These come with all sorts of blocks, they do not appear too often, but 
> on a regular basis.
>
0.15.x was doing updates against files that had been removed by another 
thread.  Made for strange errors in hdfs.  That said, I don't recall 
having seen the above.  Do you have namenode at DEBUG level?  If so, try 
tracing the above problematic blocks therein; see if you can figure out 
a story as to what happened with these blocks.

If HDFS is in an inconsistent state, hbase will be inconsistent too.

> 3. Misc
>
> I see these too from this morning
>
> 2008-01-04 08:23:30,616 ERROR org.apache.hadoop.hbase.HRegionServer: 
> unable to process message: MSG_REGION_OPEN : regionname: do
> cs,DC20020096869_20020725,43610073395851568, startKey: 
> <DC20020096869_20020725>, tableDesc: {name: docs, families: {contents:={n
> ame: contents, max versions: 3, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, language:={name: l
> anguage, max versions: 3, compression: NONE, in memory: false, max 
> length: 2147483647, bloom filter: none}, mimetype:={name: mimetyp
> e, max versions: 3, compression: NONE, in memory: false, max length: 
> 2147483647, bloom filter: none}}}
> java.io.IOException: java.io.IOException: Cannot open filename 
> /hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/co
> ntents/info/1501965039462307633
>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>        at java.lang.reflect.Method.invoke(Method.java:585)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>
>        at 
> sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
>        at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

>
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
>        at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)

>
>        at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)

>
>        at 
> org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:903)
>        at java.lang.Thread.run(Thread.java:595)
>
>

I'd guess this file is made of the above cited problematic blocks.  Can 
you find it on hdfs?  Can you download it?  (Try doing a './bin/hadoop 
fs fsck /HBASE_DIR' -- see what it says).



> Another one is this:
>
> ==> 
> /usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-62.worldlingo.com.log 
> <==
> 2008-01-04 08:16:32,001 WARN org.apache.hadoop.hbase.HRegion: Region 
> docs,DC20020099792_20020725,9149203683830573099 is NOT spli
> table though its aggregate size is 111.4m and desired size is 64.0m
>
> These come up with different region numbers.
>
This is ok.  We want to split the region because its > 64MB but this 
region has outstanding references to another, parent region, so is not 
yet splittable (It should have turned splittable a little later in your 
log).


> And another one:
>
> 2008-01-03 11:27:55,437 WARN org.apache.hadoop.hbase.HStore: Failed 
> getting store size
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File does 
> not exist
>        at 
> org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489)
>        at 
> org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1360)
>        at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:428)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>        at java.lang.reflect.Method.invoke(Method.java:585)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>
Fixed in TRUNK


>
> And another one:
>
> 2008-01-03 15:43:39,590 WARN org.apache.hadoop.dfs.DataNode: Got 
> exception while serving blk_3676251342939485484 to /192.168.105.21:
> java.io.IOException: Block blk_3676251342939485484 is not valid.
>        at 
> org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:528)
>        at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1051)
>        at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:843)
>        at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:801)
>        at java.lang.Thread.run(Thread.java:595)
>
> Again, they come with different block numbers.
>
This hdfs exception may be recoverable IIRC; hdfs gets the block elsewhere.


> 4. Inserting document with errors
>
> If I try to add one of those documents, where I get an error back, 
> this is what I see in the logs so far:
>
> Uploading -> DC20020099841_20020725
>
> Server 20:
>
>
....
> 2007-12-30 17:26:42,392 INFO org.mortbay.util.Container: Started 
> HttpContext[/static,/static]
> 2007-12-30 17:26:42,395 INFO org.mortbay.http.SocketListener: Started 
> SocketListener on 0.0.0.0:60030
> 2007-12-30 17:26:42,395 INFO org.mortbay.util.Container: Started 
> org.mortbay.jetty.Server@1fd0fafc
> 2007-12-30 17:26:42,396 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 60020: starting
> 2007-12-30 17:26:42,397 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 0 on 60020: starting
> 2007-12-30 17:26:42,402 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 1 on 60020: starting
> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 3 on 60020: starting
> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 4 on 60020: starting
> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 5 on 60020: starting
> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 6 on 60020: starting
>


Please set these to run at DEBUG level, at least while we are trying to 
figure out whats going on.

>
> Server 26:
>
> 2008-01-04 12:18:28,125 WARN org.apache.hadoop.hbase.HRegionServer: 
> java.io.IOException: java.io.IOException: Cannot open filename /
> hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633

>
>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>        at java.lang.reflect.Method.invoke(Method.java:585)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>
> 2008-01-04 12:18:28,131 WARN org.apache.hadoop.hbase.HRegionServer: 
> java.io.IOException: java.io.IOException: Cannot open filename /
> hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633

>
>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>        at java.lang.reflect.Method.invoke(Method.java:585)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>
> 2008-01-04 12:18:28,147 WARN org.apache.hadoop.hbase.HRegionServer: 
> java.io.IOException: java.io.IOException: Cannot open filename /
> hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633

>
>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>        at java.lang.reflect.Method.invoke(Method.java:585)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>
The above are bad errors.  hdfs lost our files or, more likely, we 
mangled their writing.

...
>
> Does this help?
>

Yes.

St.Ack


> Thanks,
> Lars
>
>
>
> stack wrote:
>> Lars George wrote:
>>> Hi,
>>>
>>> I have inserted about 3.5m documents in a single two column table in 
>>> HBase running on 32 nodes. So far I was able to insert most data, 
>>> but with the last million or so I am stuck with this error:
>>>
>>> org.apache.hadoop.hbase.WrongRegionException: Requested row out of 
>>> range for HRegion docs,DC20020099792_20020725,9149203683830573099, 
>>> startKey='DC20020099792_20020725', endKey='DC20020099792_20020725', 
>>> row='DC20020099841_20020725'
>>
>> This happens every time you try to do an insert?
>>
>>> Querying for the document returns nothing, means it looks like the 
>>> document does not exist - although I am sure I tried inserting it a 
>>> few times. Deleting or trying to re-insert returns the above error, 
>>> both through the API (using HTable) as well as through the HBase shell.
>>>
>>> I tried a restart of Hadoop/HBase to no avail. How do fix this 
>>> problem? Any help is appreciated.
>>
>> You have DEBUG enabled for hbase (See 
>> http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#4).  Do the logs tell 
>> you anything more: e.g. any interesting exceptions?
>>
>> Which hbase version?
>>
>> St.Ack
>>


Mime
View raw message