hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
Date Tue, 18 Jun 2013 22:21:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687286#comment-13687286
] 

Jerry He commented on HBASE-8760:
---------------------------------

[~mbertozzi]
Yes, restore/clone snapshot works wells if the parent hfile is not deleted. I am amazed to
see the part of the code that figures out the Links and half References works well.  Below
is some region server log dump.

If the parent hfile has been deleted before restore/clone, this is the error.
table1 is the original snapshot table. table1_clone is the clone_snapshot table.
--------------------------------------------------------------------------------------------------------
$ hadoop fs -lsr /hbase/.hbase-snapshot
/hbase/.hbase-snapshot/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot
/hbase/.hbase-snapshot/my_table1_snapshot/.snapshotinfo
/hbase/.hbase-snapshot/my_table1_snapshot/.tableinfo.0000000001
/hbase/.hbase-snapshot/my_table1_snapshot/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/.regioninfo
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/family1
/hbase/.hbase-snapshot/my_table1_snapshot/399a750df7646a7fb38d35780ca5254f/family1/c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/.regioninfo
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/.tmp
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/family1
/hbase/.hbase-snapshot/my_table1_snapshot/f3b8401f06dc4cbe2043f26df42e1b0e/family1/c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714


2013-06-14 22:40:03,065 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493.
2013-06-14 22:40:03,065 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13f2c26b6106505
Attempting to transition node 3ab8becbaddb796fc8a036762dbd9493 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-14 22:40:03,067 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13f2c26b6106505
Successfully transitioned node 3ab8becbaddb796fc8a036762dbd9493 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-14 22:40:03,067 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region:
{NAME => 'table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493.', STARTKEY =>
'', ENDKEY => 'user1959958463', ENCODED => 3ab8becbaddb796fc8a036762dbd9493,}
2013-06-14 22:40:03,067 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor
config now ...
2013-06-14 22:40:03,067 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493.
2013-06-14 22:40:03,069 INFO org.apache.hadoop.hbase.regionserver.Store: time to purge deletes
set to 0ms in store family1
2013-06-14 22:40:03,069 INFO org.apache.hadoop.hbase.regionserver.Store: hbase.hstore.compaction.min
= 3
2013-06-14 22:40:03,070 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile: reference 'hdfs://hdtest009:9000/hbase/table1_clone/3ab8becbaddb796fc8a036762dbd9493/family1/table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714'
to region=3e96bb19fb20e4edd27949f894878714 hfile=table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14
2013-06-14 22:40:03,071 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile: Store file hdfs://hdtest009:9000/hbase/table1_clone/3ab8becbaddb796fc8a036762dbd9493/family1/table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14.3e96bb19fb20e4edd27949f894878714
is a bottom reference to hdfs://hdtest009:9000/hbase/table1_clone/3e96bb19fb20e4edd27949f894878714/family1/table1=3e96bb19fb20e4edd27949f894878714-c272990ce92c409d8cdebd6afcb8cc14
2013-06-14 22:40:03,072 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Failed open of region=table1_clone,,1371270778458.3ab8becbaddb796fc8a036762dbd9493., starting
to roll back the global memstore size.
java.io.IOException: java.io.IOException: java.io.FileNotFoundException: Unable to open link:
org.apache.hadoop.hbase.io.HFileLink locations=[hdfs://hdtest009:9000/hbase/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.tmp/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.archive/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14]
        at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:631)
        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:544)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4372)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4320)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:101)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
        at java.lang.Thread.run(Thread.java:738)
Caused by: java.io.IOException: java.io.FileNotFoundException: Unable to open link: org.apache.hadoop.hbase.io.HFileLink
locations=[hdfs://hdtest009:9000/hbase/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.tmp/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.archive/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14]
        at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:481)
        at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:258)
        at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3322)
        at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:606)
        at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:604)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
        at java.util.concurrent.FutureTask.run(FutureTask.java:149)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
        at java.util.concurrent.FutureTask.run(FutureTask.java:149)
        ... 3 more
Caused by: java.io.FileNotFoundException: Unable to open link: org.apache.hadoop.hbase.io.HFileLink
locations=[hdfs://hdtest009:9000/hbase/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.tmp/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14,
hdfs://hdtest009:9000/hbase/.archive/table1/3e96bb19fb20e4edd27949f894878714/family1/c272990ce92c409d8cdebd6afcb8cc14]
        at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:375)
        at org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:97)
        at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:537)
        at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:639)
        at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:457)
        at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:452)
        ... 8 more

If the parent hfile is still present, everything works ok.
----------------------------------------------------------------------------

2013-06-18 14:58:50,026 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open 1 region(s)
2013-06-18 14:58:50,026 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.
2013-06-18 14:58:50,031 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13f58635d150013
Attempting to transition node 6dab7a8a16b0e195785d52ad7b15bd09 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-18 14:58:50,033 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13f58635d150013
Successfully transitioned node 6dab7a8a16b0e195785d52ad7b15bd09 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING
2013-06-18 14:58:50,034 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region:
{NAME => 'table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.', STARTKEY =>
'', ENDKEY => 'user1959958463', ENCODED => 6dab7a8a16b0e195785d52ad7b15bd09,}
2013-06-18 14:58:50,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor
config now ...
2013-06-18 14:58:50,035 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.
2013-06-18 14:58:50,039 INFO org.apache.hadoop.hbase.regionserver.Store: time to purge deletes
set to 0ms in store family1
2013-06-18 14:58:50,039 INFO org.apache.hadoop.hbase.regionserver.Store: hbase.hstore.compaction.min
= 3
2013-06-18 14:58:50,045 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile: reference 'hdfs://hdtest009:9000/hbase/table1_clone/6dab7a8a16b0e195785d52ad7b15bd09/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24.352470e8ef4d15b034ab1165b07e35e3'
to region=352470e8ef4d15b034ab1165b07e35e3 hfile=table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24
2013-06-18 14:58:50,049 DEBUG org.apache.hadoop.hbase.regionserver.StoreFile: Store file hdfs://hdtest009:9000/hbase/table1_clone/6dab7a8a16b0e195785d52ad7b15bd09/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24.352470e8ef4d15b034ab1165b07e35e3
is a bottom reference to hdfs://hdtest009:9000/hbase/table1_clone/352470e8ef4d15b034ab1165b07e35e3/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24
2013-06-18 14:58:50,062 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hdtest009:9000/hbase/table1_clone/6dab7a8a16b0e195785d52ad7b15bd09/family1/table1=352470e8ef4d15b034ab1165b07e35e3-9014c0eed1c0418daf3d42882baecf24.352470e8ef4d15b034ab1165b07e35e3,
isReference=true, isBulkLoadResult=false, seqid=32549, majorCompaction=false
2013-06-18 14:58:50,064 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined table1_clone,,1371578484233.6dab7a8a16b0e195785d52ad7b15bd09.;
next sequenceid=32550

                
> possible loss of data in snapshot taken after region split
> ----------------------------------------------------------
>
>                 Key: HBASE-8760
>                 URL: https://issues.apache.org/jira/browse/HBASE-8760
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.94.8
>            Reporter: Jerry He
>            Assignee: Jerry He
>             Fix For: 0.94.8
>
>         Attachments: HBase-8760-0.94.8.patch
>
>
> Right after a region split but before the daughter regions are compacted, we have two
daughter regions containing Reference files to the parent hfiles.
> If we take snapshot right at the moment, the snapshot will succeed, but it will only
contain the daughter Reference files. Since there is no hold on the parent hfiles, they will
be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon
after.
> A minimum we need to do is the keep these parent hfiles from being deleted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message