hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pratyush Banerjee (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-514) table 'does not exist' when it does
Date Thu, 29 May 2008 10:10:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600724#action_12600724
] 

Pratyush Banerjee commented on HBASE-514:
-----------------------------------------


I have been using Hbase for some time now. We have set up a web-crawler and we are downloading
the data into two tables in Hbase. I am using Hbase 0.1.1 for our purpose.

There is a particular table web_content having the following structure...

+---------------------------------------------------------------------------------------------------------------------+
| Column Family Descriptor                                                               
                |
+---------------------------------------------------------------------------------------------------------------------+
| name: content, max versions: 1, compression: BLOCK, in memory: false, max le|
| ngth: 2147483647, bloom filter: none                                                   
             |
+---------------------------------------------------------------------------------------------------------------------+
| name: content_length, max versions: 1, compression: BLOCK, in memory: false,|
|  max length: 32, bloom filter: none                                                    
               |
+---------------------------------------------------------------------------------------------------------------------+
| name: content_type, max versions: 1, compression: BLOCK, in memory: false, m|
| ax length: 100, bloom filter: none                                                     
                 |
+----------------------------------------------------------------------------------------------------------------------+
| name: crawl_date, max versions: 1, compression: BLOCK, in memory: false, max|
|  length: 1000, bloom filter: none                                                      
                  |
+----------------------------------------------------------------------------------------------------------------------+
| name: http_headers, max versions: 1, compression: BLOCK, in memory: false, m|
| ax length: 10000, bloom filter: none                                                   
                |
+----------------------------------------------------------------------------------------------------------------------+
| name: last_modified_date, max versions: 1, compression: BLOCK, in memory: fa |
| lse, max length: 100, bloom filter: none                                               
               |
+-----------------------------------------------------------------------------------------------------------------------+
| name: outlinks_count, max versions: 1, compression: BLOCK, in memory: false,  |
|  max length: 100, bloom filter: none                                                   
                |
+-----------------------------------------------------------------------------------------------------------------------+
| name: parsed_text, max versions: 1, compression: BLOCK, in memory: false, ma |
| x length: 2147483647, bloom filter: none                                               
              |
+-----------------------------------------------------------------------------------------------------------------------+
| name: title, max versions: 1, compression: BLOCK, in memory: false, max leng    |
| th: 1000, bloom filter: none                                                           
                     |
+-----------------------------------------------------------------------------------------------------------------------+


We are using Heritrix-2.0.1 as our crawling engine and we have created a Hbase writer which
writes the contents of the downloaded pages into the above table.
Initially the crawler runs fine and i often query the web_conten table with "Select count(*)
from web_content" to get the rate at which URLs are written in the table. However wehn the
crawler runs for hours, and we have nearly 40K-50K urls in the table, the table suddenly seems
to dissappear. Querying with the above query returns "web_content" is an non-existant table.

This has occured multiple times to me, and i found out that there was already a JIRA(HBASE-514)
on this.  It says that this issue has been fixed in version 0.1.0. We are using 0.1.1 which
is a later version but the problem seems to exist.

we have a small cluster of 6-10 machines wherein we are running hadoop-0.16.3 and hbase 0.1.1.
One machine constitutes the NameNode, SecondaryNameNode, HMaster. While other machines form
regionservers and datanodes.
I had run the system with DEBUG enabled and i am attaching the log files for help.

this is a screenshot of the error thati am facing...

hql > select count(*) from web_content;
08/05/29 09:38:13 INFO hbase.HTable: Creating scanner over web_content starting at key
08/05/29 09:38:13 DEBUG hbase.HTable: Advancing internal scanner to startKey
08/05/29 09:38:13 DEBUG hbase.HTable: New region: address: 10.178.87.27:60020, regioninfo:
regionname: web_content,,1212056557578, startKey: <>, endKey: <>, encodedName:
1099787575, tableDesc: {name: web_content, families: {content:={name: content, max versions:
1, compression: BLOCK, in memory: false, max length: 2147483647, bloom filter: none}, content_length:={name:
content_length, max versions: 1, compression: BLOCK, in memory: false, max length: 32, bloom
filter: none}, content_type:={name: content_type, max versions: 1, compression: BLOCK, in
memory: false, max length: 100, bloom filter: none}, crawl_date:={name: crawl_date, max versions:
1, compression: BLOCK, in memory: false, max length: 1000, bloom filter: none}, http_headers:={name:
http_headers, max versions: 1, compression: BLOCK, in memory: false, max length: 10000, bloom
filter: none}, last_modified_date:={name: last_modified_date, max versions: 1, compression:
BLOCK, in memory: false, max length: 100, bloom filter: none}, outlinks_count:={name: outlinks_count,
max versions: 1, compression: BLOCK, in memory: false, max length: 100, bloom filter: none},
parsed_text:={name: parsed_text, max versions: 1, compression: BLOCK, in memory: false, max
length: 2147483647, bloom filter: none}, title:={name: title, max versions: 1, compression:
BLOCK, in memory: false, max length: 1000, bloom filter: none}}}
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.UnknownScannerException: Name:
-7311630080500504399
        at org.apache.hadoop.hbase.HRegionServer.next(HRegionServer.java:1425)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)

        at org.apache.hadoop.ipc.Client.call(Client.java:512)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:210)
        at $Proxy1.next(Unknown Source)
        at org.apache.hadoop.hbase.HTable$ClientScanner.next(HTable.java:914)
        at org.apache.hadoop.hbase.hql.SelectCommand.scanPrint(SelectCommand.java:233)
        at org.apache.hadoop.hbase.hql.SelectCommand.execute(SelectCommand.java:100)
        at org.apache.hadoop.hbase.hql.HQLClient.executeQuery(HQLClient.java:50)
        at org.apache.hadoop.hbase.Shell.main(Shell.java:114)
8797 row(s) in set. (140.71 sec)
hql > select count(*) from web_content;
08/05/29 09:45:34 INFO hbase.HTable: Creating scanner over web_content starting at key
08/05/29 09:45:34 DEBUG hbase.HTable: Advancing internal scanner to startKey
08/05/29 09:45:34 DEBUG hbase.HTable: New region: address: 10.178.87.27:60020, regioninfo:
regionname: web_content,,1212056557578, startKey: <>, endKey: <>, encodedName:
1099787575, tableDesc: {name: web_content, families: {content:={name: content, max versions:
1, compression: BLOCK, in memory: false, max length: 2147483647, bloom filter: none}, content_length:={name:
content_length, max versions: 1, compression: BLOCK, in memory: false, max length: 32, bloom
filter: none}, content_type:={name: content_type, max versions: 1, compression: BLOCK, in
memory: false, max length: 100, bloom filter: none}, crawl_date:={name: crawl_date, max versions:
1, compression: BLOCK, in memory: false, max length: 1000, bloom filter: none}, http_headers:={name:
http_headers, max versions: 1, compression: BLOCK, in memory: false, max length: 10000, bloom
filter: none}, last_modified_date:={name: last_modified_date, max versions: 1, compression:
BLOCK, in memory: false, max length: 100, bloom filter: none}, outlinks_count:={name: outlinks_count,
max versions: 1, compression: BLOCK, in memory: false, max length: 100, bloom filter: none},
parsed_text:={name: parsed_text, max versions: 1, compression: BLOCK, in memory: false, max
length: 2147483647, bloom filter: none}, title:={name: title, max versions: 1, compression:
BLOCK, in memory: false, max length: 1000, bloom filter: none}}}
08/05/29 09:45:44 DEBUG hbase.HTable: reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException:
web_content,,1212056557578
        at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1639)
        at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1611)
        at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1480)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)

08/05/29 09:45:44 DEBUG hbase.HConnectionManager$TableServers: reloading table servers because:
region offline: web_content,,1212056557578
08/05/29 09:45:54 DEBUG hbase.HConnectionManager$TableServers: reloading table servers because:
HRegionInfo was null or empty in .META.
08/05/29 09:46:04 DEBUG hbase.HConnectionManager$TableServers: reloading table servers because:
HRegionInfo was null or empty in .META.
org.apache.hadoop.hbase.TableNotFoundException: Table 'web_content' does not exist.
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:418)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:350)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:318)
        at org.apache.hadoop.hbase.HTable.getRegionLocation(HTable.java:114)
        at org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:889)
        at org.apache.hadoop.hbase.HTable$ClientScanner.<init>(HTable.java:817)
        at org.apache.hadoop.hbase.HTable.obtainScanner(HTable.java:522)
        at org.apache.hadoop.hbase.HTable.obtainScanner(HTable.java:411)
        at org.apache.hadoop.hbase.hql.SelectCommand.scanPrint(SelectCommand.java:219)
        at org.apache.hadoop.hbase.hql.SelectCommand.execute(SelectCommand.java:100)
        at org.apache.hadoop.hbase.hql.HQLClient.executeQuery(HQLClient.java:50)
        at org.apache.hadoop.hbase.Shell.main(Shell.java:114)
0 row(s) in set. (40.37 sec)

Interestingly after this if I query the Hbase using show tables, it still shows me two tables.

If anybody can tell me what exactly is going wrong or what is the intended fix, it will be
of great help.

thanks

Pratyush

> table 'does not exist' when it does
> -----------------------------------
>
>                 Key: HBASE-514
>                 URL: https://issues.apache.org/jira/browse/HBASE-514
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.16.0
>            Reporter: stack
>            Assignee: Bryan Duxbury
>             Fix For: 0.1.0
>
>         Attachments: 514-0.1-v2.patch, 514-0.1-v3.patch, 514-0.1.patch, region-v2.patch
>
>
> This one I've seen a few times.  In hql, I do show tables and it shows my table.  I then
try to do a select against the table and hql reports table does not exist.  Digging, whats
happening is that the getClosest facility is failing to find the first table region in the
.META. table.  I hacked up a region reading tool -- attached (for 0.1 branch) -- and tried
it against but a copy and the actual instance of the region and it could do the getClosest
fine.  I'm pretty sure I restarted the HRS and when it came up again, the master had given
it again the .META. and again was failing to find the first region in the table (Looked around
in server logs and it seemed 'healthy').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message