hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From OpenInx <open...@gmail.com>
Subject Re: TimeoutException on Snapshots
Date Tue, 23 Jul 2019 14:30:15 GMT
> My question is: is it safe to ignore these TimeoutExceptions? if the
SnapshotRegionManifests are not being written due to a timeout does that
mean we are losing data or getting inconsistencies?
I don't think ignoring the TimeoutException is a good idea.   You need to
find out why did the snapshot taking timeout.  I also encountered the
similar case in my cluster, the cluster have 30 RS, but the table have 1600
regions, means each RS need to flush & write the region mainfests for 50+
regions. while the SnapshotSubprocedurePool has a default 3
threadSize...it's easy to timeout for that case because of the pool is too
small.

So I enlarged the config keys (Note that increasing the
'hbase.snapshot.master.timeout.millis' is not enough, because the RS can
also be timeout) and rolled update the clusters, it works pretty good for
me now.

hbase.snapshot.master.timeout.millis=1200000
hbase.snapshot.region.timeout=1200000
hbase.snapshot.region.concurrentTasks=20

Hope it will be helpfull for you ,  Arwin.

On Tue, Jul 23, 2019 at 3:55 PM Arwin Tio <arwin.tio@hotmail.com> wrote:

> Hi all,
>
> I've been running into these issues after restoring from snapshots:
>
> https://issues.apache.org/jira/browse/HBASE-16464
> https://issues.apache.org/jira/browse/HBASE-17992
>
> Essentially, HRegion#addRegionToSnapshot has been timing out in
> TakeSnapshotHandler, resulting in some leftover tmp files. The leftover tmp
> files causes archivedHFileCleaner, which manifests in an extremely large
> archive folder that doesn't get cleaned up.
>
> HBASE-16464 solves the bloating archive folder by preventing the
> SnapshotRegionManifest from being written if the operation has timed out
> (see:
> https://github.com/apache/hbase/commit/ab011391ab392f1a62b6ea9bdca87fc950af42a9#diff-4ec74c1b12f2be4f52c33260fd8b73efR86
> )
>
> My question is: is it safe to ignore these TimeoutExceptions? if the
> SnapshotRegionManifests are not being written due to a timeout does that
> mean we are losing data or getting inconsistencies?
>
> If so, what are some potential remedies for this? I'm thinking we can just
> increase the timeout 'hbase.snapshot.master.timeout.millis' but is there a
> better way?
>
> Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message