hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liang xie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-3834) Store ignores checksum errors when opening files
Date Fri, 14 Sep 2012 11:14:07 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liang xie updated HBASE-3834:
-----------------------------

    Attachment: hbase-3834.tar.gz2

Hi all, i ran a manual test today, it turned out this issue should be gone away in latest
version, at least for 0.94.0

Here is my test details:
0)My env: ubuntu 10.10, hbase-0.94.0 release, standalone mode
1)start HBase
2)from hbase shell:
hbase(main):002:0> status
1 servers, 0 dead, 2.0000 average load

hbase(main):003:0> version
0.94.0, r1332822, Tue May  1 21:43:54 UTC 2012

hbase(main):004:0> create 'test','cf'
0 row(s) in 1.1520 seconds

hbase(main):005:0> list 'test'
TABLE                                                                                    
                                                                                         
                                                         
test                                                                                     
                                                                                         
                                                         
1 row(s) in 0.0230 seconds

hbase(main):010:0> put 'test','row1','cf:a','value1'
0 row(s) in 0.0160 seconds

hbase(main):011:0> put 'test','row2','cf:b','value2'
0 row(s) in 0.0070 seconds

hbase(main):012:0> put 'test','row3','cf:c','value3'
0 row(s) in 0.0070 seconds

hbase(main):017:0> scan 'test'
ROW                                                          COLUMN+CELL                 
                                                                                         
                                                         
 row1                                                        column=cf:a, timestamp=1347619555652,
value=value1                                                                             
                                                
 row2                                                        column=cf:b, timestamp=1347619562943,
value=value2                                                                             
                                                
 row3                                                        column=cf:c, timestamp=1347619576704,
value=value3                                                                             
                                                
3 row(s) in 0.0310 seconds

hbase(main):027:0> flush 'test'
0 row(s) in 0.0770 seconds

hbase(main):028:0> exit

3)shutdown hbase,then edit the according files with vim

4)start hbase again

5)from log file, we can see:
2012-09-14 18:54:15,096 INFO org.apache.hadoop.hbase.regionserver.Store: time to purge deletes
set to 0ms in store null
2012-09-14 18:54:15,118 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: b[0,
286]=454e010400000012136866696c652e4156475f56414c55455f4c454e0104000000040d6866696c652e4c4153544b455901130004726f77330263666300000139c462db8004a887830f545241424c4b2224000000000000010f00000000000000a600000001000000000000001f0000000000000000000002f600000000000000040000000200000001000000000000000000000000000000006f72672e6170616368652e6861646f6f702e68626173652e4b657956616c7565244b6579436f6d70617261746f7200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000020a
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/tmp/hbase/test/4c9f2cfda63b2e9785815ed2e841d052/cf/269b59832d68465687ebce880026a301
at 512
        at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
        at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
...
2012-09-14 18:54:15,123 INFO org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Opening of region {NAME => 'test,,1347619334483.4c9f2cfda63b2e9785815ed2e841d052.', STARTKEY
=> '', ENDKEY => '', ENCODED => 4c9f2cfda63b2e9785815ed2e841d052,} failed, marking
as FAILED_OPEN in ZK
2012-09-14 18:54:15,123 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:48394-0x139c46a10d40001
Attempting to transition node 4c9f2cfda63b2e9785815ed2e841d052 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_FAILED_OPEN
2012-09-14 18:54:15,137 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:48394-0x139c46a10d40001
Successfully transitioned node 4c9f2cfda63b2e9785815ed2e841d052 from RS_ZK_REGION_OPENING
to RS_ZK_REGION_FAILED_OPEN
2012-09-14 18:54:15,138 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_FAILED_OPEN,
server=xieliang,48394,1347620050131, region=4c9f2cfda63b2e9785815ed2e841d052
2012-09-14 18:54:15,141 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler:
Handling CLOSED event for 4c9f2cfda63b2e9785815ed2e841d052
2012-09-14 18:54:15,141 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
was=test,,1347619334483.4c9f2cfda63b2e9785815ed2e841d052. state=CLOSED, ts=1347620055124,
server=xieliang,48394,1347620050131
2012-09-14 18:54:15,141 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:41380-0x139c46a10d40000
Creating (or updating) unassigned node for 4c9f2cfda63b2e9785815ed2e841d052 with OFFLINE state
                
> Store ignores checksum errors when opening files
> ------------------------------------------------
>
>                 Key: HBASE-3834
>                 URL: https://issues.apache.org/jira/browse/HBASE-3834
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.2
>            Reporter: Todd Lipcon
>            Assignee: liang xie
>            Priority: Critical
>             Fix For: 0.90.8
>
>         Attachments: hbase-3834.tar.gz2
>
>
> If you corrupt one of the storefiles in a region (eg using vim to muck up some bytes),
the region will still open, but that storefile will just be ignored with a log message. We
should probably not do this in general - better to keep that region unassigned and force an
admin to make a decision to remove the bad storefile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message