hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19343) Restore snapshot makes parent split region online
Date Fri, 24 Nov 2017 16:23:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265400#comment-16265400
] 

Ted Yu commented on HBASE-19343:
--------------------------------

I looked at the code in RestoreSnapshotHelper of master branch where getTableRegions() is
used.

Did you observe such problem in other scenario(s) ?
If not, it seems option #1 is better since this is only triggered by snapshot restore.

If you have bandwidth working on this issue, first step would be coming up with unit test
which shows the behavior.

Thanks

> Restore snapshot makes parent split region online 
> --------------------------------------------------
>
>                 Key: HBASE-19343
>                 URL: https://issues.apache.org/jira/browse/HBASE-19343
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>            Reporter: Pankaj Kumar
>            Assignee: Pankaj Kumar
>         Attachments: Snapshot.jpg
>
>
> Restore snapshot makes parent split region online as shown in the attached snapshot.
> Steps to reproduce
> =====================
> 1. Create table
> 2. Insert few records into the table
> 3. flush the table
> 4. Split the table
> 5. Create snapshot before catalog janitor clears the parent region entry from meta.
> 6. Restore snapshot
> We can see the problem in meta entries,
> Meta content before restore snapshot:
> {noformat}
> t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:regioninfo, timestamp=1511537565964,
value={ENCODED => 077a12b0b3c91b053fa95223635f9543, NAME => 't1,,1511537529449.077a12b0b3c91b053fa95223635f9543.',
STARTKEY =>
>                                                               '', ENDKEY => '', OFFLINE
=> true, SPLIT => true}
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:seqnumDuringOpen,
timestamp=1511537530107, value=\x00\x00\x00\x00\x00\x00\x00\x02
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:server, timestamp=1511537530107,
value=host-xx:16020
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:serverstartcode,
timestamp=1511537530107, value=1511537511523
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:splitA, timestamp=1511537565964,
value={ENCODED => 3c7c866d4df370c586131a4cbe0ef6a8, NAME => 't1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.',
STARTKEY => '',
>                                                               ENDKEY => 'm'}
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:splitB, timestamp=1511537565964,
value={ENCODED => dc7facd824c85b94e5bf6a2e6b5f5efc, NAME => 't1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.',
STARTKEY => 'm
>                                                              ', ENDKEY => ''}
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:regioninfo,
timestamp=1511537566075, value={ENCODED => 3c7c866d4df370c586131a4cbe0ef6a8, NAME =>
't1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.', STARTKEY =>
>                                                               '', ENDKEY => 'm'}
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:seqnumDuringOpen,
timestamp=1511537566075, value=\x00\x00\x00\x00\x00\x00\x00\x02
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:server, timestamp=1511537566075,
value=host-xx:16020
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:serverstartcode,
timestamp=1511537566075, value=1511537511523
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:regioninfo,
timestamp=1511537566069, value={ENCODED => dc7facd824c85b94e5bf6a2e6b5f5efc, NAME =>
't1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.', STARTKEY =
>                                                              > 'm', ENDKEY => ''}
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:seqnumDuringOpen,
timestamp=1511537566069, value=\x00\x00\x00\x00\x00\x00\x00\x08
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:server, timestamp=1511537566069,
value=host-xx:16020
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:serverstartcode,
timestamp=1511537566069, value=1511537511523
> {noformat}
> Meta content after restore snapshot:
> {noformat}
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:regioninfo,
timestamp=1511537667635, value={ENCODED => 077a12b0b3c91b053fa95223635f9543, NAME =>
't1,,1511537529449.077a12b0b3c91b053fa95223635f9543.', STARTKEY =>
>                                                               '', ENDKEY => ''}
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:seqnumDuringOpen,
timestamp=1511537667635, value=\x00\x00\x00\x00\x00\x00\x00\x0A
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:server, timestamp=1511537667635,
value=host-xx:16020
>  t1,,1511537529449.077a12b0b3c91b053fa95223635f9543.         column=info:serverstartcode,
timestamp=1511537667635, value=1511537511523
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:regioninfo,
timestamp=1511537667598, value={ENCODED => 3c7c866d4df370c586131a4cbe0ef6a8, NAME =>
't1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.', STARTKEY =>
>                                                               '', ENDKEY => 'm'}
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:seqnumDuringOpen,
timestamp=1511537667598, value=\x00\x00\x00\x00\x00\x00\x00\x0B
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:server, timestamp=1511537667598,
value=host-xx:16020
>  t1,,1511537565718.3c7c866d4df370c586131a4cbe0ef6a8.         column=info:serverstartcode,
timestamp=1511537667598, value=1511537511523
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:regioninfo,
timestamp=1511537667621, value={ENCODED => dc7facd824c85b94e5bf6a2e6b5f5efc, NAME =>
't1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.', STARTKEY =
>                                                              > 'm', ENDKEY => ''}
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:seqnumDuringOpen,
timestamp=1511537667621, value=\x00\x00\x00\x00\x00\x00\x00\x0D
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:server, timestamp=1511537667621,
value=host-xx:16020
>  t1,m,1511537565718.dc7facd824c85b94e5bf6a2e6b5f5efc.        column=info:serverstartcode,
timestamp=1511537667621, value=1511537511523
> {noformat}
> Root Cause:
> We dont update the region split information in .regioninfo file in HDFS, but while restoring
the snapshot we set regioninfo based on the .regioninfo entries,
> {code}
>     // Identify which region are still available and which not.
>     // NOTE: we rely upon the region name as: "table name, start key, end key"
>     List<HRegionInfo> tableRegions = getTableRegions();
>     if (tableRegions != null) {
>       monitor.rethrowException();
>       for (HRegionInfo regionInfo: tableRegions) {
>         String regionName = regionInfo.getEncodedName();
>         if (regionNames.contains(regionName)) {
>           LOG.info("region to restore: " + regionName);
>           regionNames.remove(regionName);
>           metaChanges.addRegionToRestore(regionInfo);
>         } else {
>           LOG.info("region to remove: " + regionName);
>           metaChanges.addRegionToRemove(regionInfo);
>         }
>       }
> {code}
> Here getTableRegions() is retrieved from HDFS.
> There can be two solutions,
> 1. Set the regioninfo based on the snapshot-manifest details.
> 2. Update the .regioninfo after region split



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message