hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <l...@worldlingo.com>
Subject Re: WrongRegionException - How do I fix it?
Date Fri, 04 Jan 2008 23:08:41 GMT
Hi Stack,

Can and will do, but does that make the error go away, i.e. 
automagically fix it? Or is it broken and nothing can be done about it now?

Lars

stack wrote:
> If possible, please move to TRUNK instead.  Most of the below have 
> been addressed there (I can send you a patch if you want to run hbase 
> TRUNK on hadoop 0.15.x).
>
> Further comments inline below:
>
>
> Lars George wrote:
>> Hi Stack,
>>
>> Yes, it happens every time I insert particular rows. Before it would 
>> fail every now and so often, but since now all "good" rows are 
>> inserted I am stuck with the ones that do not insert. And I am sure 
>> they did once, with no error. So they are in there in limbo, but I 
>> cannot retrieve nor delete or insert them.
>>
>> It mentions in the FAQ that I can switch on debugging through the UI, 
>> but I cannot see where. I am using version 0.15.1, is that supposed 
>> to have that option or do I need to go the log4j.properties plus 
>> restart route?
>
> This is a post-0.15 release feature (It says post-0.15.x in the FAQ).
>
>
>> I have errors all the  time - which quite frankly worry me. Here a 
>> list of what I see so far:
>>
>> 1. At startup
>>
>> ==> 
>> /usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-40.worldlingo.com.log 
>> <==
>> 2008-01-03 14:11:22,512 WARN org.apache.hadoop.util.NativeCodeLoader: 
>> Unable to load native-hadoop library for your platform... using 
>> builtin-java classes where applicable
>> 2008-01-03 14:11:29,808 WARN org.apache.hadoop.hbase.HRegionServer: 
>> Processing message (Retry: 0)
>> java.io.IOException: java.io.IOException: 
>> java.util.ConcurrentModificationException 
> ....
>
>
> Fixed in TRUNK
>
>
>>
>>
>> 2. Sporadically
>>
>> 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: 
>> Unexpected error trying to delete block blk_-8931657506153335343. 
>> Block not found in blockMap.
>> 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: 
>> Unexpected error trying to delete block blk_3775459202881005176. 
>> Block not found in blockMap.
>> 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: 
>> Unexpected error trying to delete block blk_-283089329129695997. 
>> Block not found in blockMap.
>> 2008-01-03 21:32:00,644 WARN org.apache.hadoop.dfs.DataNode: 
>> java.io.IOException: Error in deleting blocks.
>>        at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:719)
>>        at 
>> org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:625)
>>        at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:528)
>>        at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1494)
>>        at java.lang.Thread.run(Thread.java:595)
>>
>> These come with all sorts of blocks, they do not appear too often, 
>> but on a regular basis.
>>
> 0.15.x was doing updates against files that had been removed by 
> another thread.  Made for strange errors in hdfs.  That said, I don't 
> recall having seen the above.  Do you have namenode at DEBUG level?  
> If so, try tracing the above problematic blocks therein; see if you 
> can figure out a story as to what happened with these blocks.
>
> If HDFS is in an inconsistent state, hbase will be inconsistent too.
>
>> 3. Misc
>>
>> I see these too from this morning
>>
>> 2008-01-04 08:23:30,616 ERROR org.apache.hadoop.hbase.HRegionServer: 
>> unable to process message: MSG_REGION_OPEN : regionname: do
>> cs,DC20020096869_20020725,43610073395851568, startKey: 
>> <DC20020096869_20020725>, tableDesc: {name: docs, families: 
>> {contents:={n
>> ame: contents, max versions: 3, compression: NONE, in memory: false, 
>> max length: 2147483647, bloom filter: none}, language:={name: l
>> anguage, max versions: 3, compression: NONE, in memory: false, max 
>> length: 2147483647, bloom filter: none}, mimetype:={name: mimetyp
>> e, max versions: 3, compression: NONE, in memory: false, max length: 
>> 2147483647, bloom filter: none}}}
>> java.io.IOException: java.io.IOException: Cannot open filename 
>> /hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/co
>> ntents/info/1501965039462307633
>>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>
>>        at java.lang.reflect.Method.invoke(Method.java:585)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>>
>>        at 
>> sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
>>        at 
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

>>
>>        at 
>> java.lang.reflect.Constructor.newInstance(Constructor.java:494)
>>        at 
>> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)

>>
>>        at 
>> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)

>>
>>        at 
>> org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:903)
>>        at java.lang.Thread.run(Thread.java:595)
>>
>>
>
> I'd guess this file is made of the above cited problematic blocks.  
> Can you find it on hdfs?  Can you download it?  (Try doing a 
> './bin/hadoop fs fsck /HBASE_DIR' -- see what it says).
>
>
>
>> Another one is this:
>>
>> ==> 
>> /usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-62.worldlingo.com.log 
>> <==
>> 2008-01-04 08:16:32,001 WARN org.apache.hadoop.hbase.HRegion: Region 
>> docs,DC20020099792_20020725,9149203683830573099 is NOT spli
>> table though its aggregate size is 111.4m and desired size is 64.0m
>>
>> These come up with different region numbers.
>>
> This is ok.  We want to split the region because its > 64MB but this 
> region has outstanding references to another, parent region, so is not 
> yet splittable (It should have turned splittable a little later in 
> your log).
>
>
>> And another one:
>>
>> 2008-01-03 11:27:55,437 WARN org.apache.hadoop.hbase.HStore: Failed 
>> getting store size
>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File does 
>> not exist
>>        at 
>> org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489)
>>        at 
>> org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1360)
>>        at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:428)
>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>
>>        at java.lang.reflect.Method.invoke(Method.java:585)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>>
> Fixed in TRUNK
>
>
>>
>> And another one:
>>
>> 2008-01-03 15:43:39,590 WARN org.apache.hadoop.dfs.DataNode: Got 
>> exception while serving blk_3676251342939485484 to /192.168.105.21:
>> java.io.IOException: Block blk_3676251342939485484 is not valid.
>>        at 
>> org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:528)
>>        at 
>> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1051)
>>        at 
>> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:843)
>>        at 
>> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:801)
>>        at java.lang.Thread.run(Thread.java:595)
>>
>> Again, they come with different block numbers.
>>
> This hdfs exception may be recoverable IIRC; hdfs gets the block 
> elsewhere.
>
>
>> 4. Inserting document with errors
>>
>> If I try to add one of those documents, where I get an error back, 
>> this is what I see in the logs so far:
>>
>> Uploading -> DC20020099841_20020725
>>
>> Server 20:
>>
>>
> ....
>> 2007-12-30 17:26:42,392 INFO org.mortbay.util.Container: Started 
>> HttpContext[/static,/static]
>> 2007-12-30 17:26:42,395 INFO org.mortbay.http.SocketListener: Started 
>> SocketListener on 0.0.0.0:60030
>> 2007-12-30 17:26:42,395 INFO org.mortbay.util.Container: Started 
>> org.mortbay.jetty.Server@1fd0fafc
>> 2007-12-30 17:26:42,396 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> listener on 60020: starting
>> 2007-12-30 17:26:42,397 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 0 on 60020: starting
>> 2007-12-30 17:26:42,402 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 1 on 60020: starting
>> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 3 on 60020: starting
>> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 4 on 60020: starting
>> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 5 on 60020: starting
>> 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 6 on 60020: starting
>>
>
>
> Please set these to run at DEBUG level, at least while we are trying 
> to figure out whats going on.
>
>>
>> Server 26:
>>
>> 2008-01-04 12:18:28,125 WARN org.apache.hadoop.hbase.HRegionServer: 
>> java.io.IOException: java.io.IOException: Cannot open filename /
>> hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633

>>
>>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>
>>        at java.lang.reflect.Method.invoke(Method.java:585)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>>
>> 2008-01-04 12:18:28,131 WARN org.apache.hadoop.hbase.HRegionServer: 
>> java.io.IOException: java.io.IOException: Cannot open filename /
>> hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633

>>
>>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>
>>        at java.lang.reflect.Method.invoke(Method.java:585)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>>
>> 2008-01-04 12:18:28,147 WARN org.apache.hadoop.hbase.HRegionServer: 
>> java.io.IOException: java.io.IOException: Cannot open filename /
>> hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633

>>
>>        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
>>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>
>>        at java.lang.reflect.Method.invoke(Method.java:585)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>>
> The above are bad errors.  hdfs lost our files or, more likely, we 
> mangled their writing.
>
> ...
>>
>> Does this help?
>>
>
> Yes.
>
> St.Ack
>
>
>> Thanks,
>> Lars
>>
>>
>>
>> stack wrote:
>>> Lars George wrote:
>>>> Hi,
>>>>
>>>> I have inserted about 3.5m documents in a single two column table 
>>>> in HBase running on 32 nodes. So far I was able to insert most 
>>>> data, but with the last million or so I am stuck with this error:
>>>>
>>>> org.apache.hadoop.hbase.WrongRegionException: Requested row out of 
>>>> range for HRegion docs,DC20020099792_20020725,9149203683830573099, 
>>>> startKey='DC20020099792_20020725', endKey='DC20020099792_20020725', 
>>>> row='DC20020099841_20020725'
>>>
>>> This happens every time you try to do an insert?
>>>
>>>> Querying for the document returns nothing, means it looks like the 
>>>> document does not exist - although I am sure I tried inserting it a 
>>>> few times. Deleting or trying to re-insert returns the above error, 
>>>> both through the API (using HTable) as well as through the HBase 
>>>> shell.
>>>>
>>>> I tried a restart of Hadoop/HBase to no avail. How do fix this 
>>>> problem? Any help is appreciated.
>>>
>>> You have DEBUG enabled for hbase (See 
>>> http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#4).  Do the logs tell 
>>> you anything more: e.g. any interesting exceptions?
>>>
>>> Which hbase version?
>>>
>>> St.Ack
>>>
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message