hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bo Cui (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17992) The snapShot TimeoutException causes the cleanerChore thread to fail to complete the archive correctly
Date Mon, 08 May 2017 07:27:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000366#comment-16000366

Bo Cui commented on HBASE-17992:

whether the disabledTableSnapshot#exec needs to set  the total duration of waiting?
public void snapshotRegions(List<Pair<HRegionInfo, ServerName>> regionsAndLocations)
 throws IOException, KeeperException {
 ThreadPoolExecutor exec = SnapshotManifest.createExecutor(conf, "DisabledTableSnapshot");
      try {
        ModifyRegionUtils.editRegions(exec, regions, new ModifyRegionUtils.RegionEditTask()
          public void editRegion(final HRegionInfo regionInfo) throws IOException {
            snapshotManifest.addRegion(FSUtils.getTableDir(rootDir, snapshotTable), regionInfo);
	}catch(IOException e){
        throw e;

Snapshotmanifest#addregion() : read memory and write HDFS
Read memory -- does not take a long time
Write HDFS -- HDFS has its own timeout or exception handling
And exec defaults to eight threads, and if an exception occurs, only >= 8 threads execute.
So I think there's no need set  the total duration of waiting, and ensure that all task ends.

> The snapShot TimeoutException causes the cleanerChore thread to fail to complete the
archive correctly
> ------------------------------------------------------------------------------------------------------
>                 Key: HBASE-17992
>                 URL: https://issues.apache.org/jira/browse/HBASE-17992
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.98.10, 1.3.0
>            Reporter: Bo Cui
>         Attachments: hbase-17992.patch
> The problem is that when the snapshot occurs TimeoutException  or other Exceptions, there
is no correct delete /hbase/.hbase-snapshot/tmp, which causes the cleanerChore to fail to
complete the archive correctly.
> Modifying the configuration parameter (hbase.snapshot.master.timeout.millis = 600000)
only reduces the probability of the problem occurring.
> So the solution to the problem is: multi-Threaded exceptions or TimeoutExceptions, the
Main-thread must wait until all the tasks are finished or canceled, the Main-thread can be
cleared /hbase/.hbase-snapshot/tmp/snapshotName.Otherwise the task is likely to write /hbase/.hbase-snapshot/tmp/snapshotName/region
- mainfest
> The problem exists in disabledTableSnapshot and enabledTableSnapshot, because I'm currently
using the disabledTableSnapshot, so I provide the patch of disabledTableSnapshot

This message was sent by Atlassian JIRA

View raw message