hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ishan Chhabra <ichha...@rocketfuel.com>
Subject Re: snapshot timeout problem
Date Mon, 21 Jul 2014 17:48:12 GMT
The snapshot timeout properties are confusingly named and I dug through the
code to understand them some time ago. Use these:

  <property>
    <name>hbase.snapshot.master.timeoutMillis</name>
    <!-- Change from default of 60s to 600s to allow for slow flushing of
tables -->
    <value>600000</value>
    <description>
      This is the time HBase master waits for the snapshot operation to
complete.
      Do not confuse this hbase.snapshot.master.timeout.millis, which
although
      sounding similar, serves a very different purpose.
      Note: This property has a completely different meaning before hbase
version
      0.94.11 and should not enabled on a cluster using snapshots and
running
      a version before 0.94.11.
    </description>
  </property>
  <property>
    <name>hbase.snapshot.master.timeout.millis</name>
    <!-- Change from default of 60s to 600s to allow for slow flushing of
tables -->
    <value>600000</value>
    <description>
      This is the timeout the master indicates the client to wait when it
takes
      the snapshot. The client actually waits longer than this due to
exponential
      backoff. See HBaseAdmin.snapshot for the exact algorithm.
    </description>
  </property>
  <property>
    <name>hbase.snapshot.region.timeout</name>
    <!-- Change from default of 60s to 600s to allow for slow flushing of
tables -->
    <value>600000</value>
    <description>
      This is the time the regionserver waits to complete all of its
activities
      for a snapshot operation.
    </description>
  </property>


On Mon, Jul 21, 2014 at 7:02 AM, Matteo Bertozzi <theo.bertozzi@gmail.com>
wrote:

> There are two timeout properties. one on the region server side and the
> other one on master side (the coordinator).
>
> "hbase.snapshot.master.timeoutMillis"
> "hbase.snapshot.region.timeout"
>
> increasing the master side only has no effect since the region server side
> will send a timeout to the master after the default 60sec.
>
>
> Matteo
>
>
>
> On Mon, Jul 21, 2014 at 2:56 PM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
>
> > There are 174 regions, not well balanced. One RegionServer has 69
> regions.
> > That RegionServer generates a
> > series of log entries (modified and shown below), one for each region, at
> > roughly 1 to 2 second intervals. The timeout period expires when
> > it reaches region 36.
> >
> > 2014-07-21 07:49:44,503 regionserver.HRegion: Creating references for
> > hfiles
> > 2014-07-21 07:49:44,503 regionserver.HRegion: Adding snapshot references
> > for [hdfs://
> >
> xxx.digitalenvoy.net:8020/apps/hbase/data/data/default/hosts/31e2a098e9e311c4ddcfd3d8da28dfb6/p/3749b6df36c749508fe9c6f54ca425f2
> ]
> > hfiles
> > 2014-07-21 07:49:44,503 regionserver.HRegion: Creating reference for file
> > (1/1) : hdfs://
> >
> xxx.digitalenvoy.net:8020/apps/hbase/data/data/default/hosts/31e2a098e9e311c4ddcfd3d8da28dfb6/p/3749b6df36c749508fe9c6f54ca425f2
> > 2014-07-21 07:49:45,136 snapshot.FlushSnapshotSubprocedure: ... Flush
> > Snapshotting region
> > hosts,\x00\x03|\xBF!,1400600029600.31e2a098e9e311c4ddcfd3d8da28dfb6.
> > completed.
> > 2014-07-21 07:49:45,137 snapshot.FlushSnapshotSubprocedure: Closing
> region
> > operation on
> >
> hosts,\x00\x03|\xBF!,1400600029600.31e2a098e9e311c4ddcfd3d8da28dfb6.2014-07-21
> > 07:49:45,137 DEBUG [rs(xxx.digitalenvoy.net
> ,60020,1405943192177)-snapshot-pool3-thread-1]
> > snapshot.FlushSnapshotSubprocedure: Starting region operation on
> > hosts,\x00\x8A\x90\xD6\x08,1400
> > 659179080.a74402fcbd9a96a7c92b250721095729.2014-07-21 07:49:45,137 DEBUG
> > [member: ‘xxx.digitalenvoy.net,60020,1405943192177'
> > subprocedure-pool1-thread-2] snapshot.RegionServerSnapshotManager:
> > Completed 1/174 local region snapshots.
> > 2014-07-21 07:49:45,137 snapshot.FlushSnapshotSubprocedure: Flush
> > Snapshotting region
> >
> hosts,\x00\x8A\x90\xD6\x08,1400659179080.a74402fcbd9a96a7c92b250721095729.
> > started...
> > 2014-07-21 07:49:45,137 regionserver.HRegion: Storing region-info for
> > snapshot.
> >
> > On Jul 21, 2014, at 9:21 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> > wrote:
> >
> > > Can you also tell us more about your table? How many regions on how
> many
> > > region servers?
> > >
> > >
> > > 2014-07-21 8:23 GMT-04:00 Ted Yu <yuzhihong@gmail.com>:
> > >
> > >> Normally such timeout is caused by one region server which is slow in
> > >> completing its part of the snapshot procedure.
> > >>
> > >> Have you looked at region server logs ?
> > >> Feel free to pastebin relevant portion.
> > >>
> > >> Thanks
> > >>
> > >> On Jul 21, 2014, at 4:03 AM, Brian Jeltema <
> > brian.jeltema@digitalenvoy.net>
> > >> wrote:
> > >>
> > >>> I’m running HBase 0.98. I’m trying to snapshot a table, but it’s
> timing
> > >> out after 60 seconds.
> > >>> I increased the value of hbase.snapshot.master.timeoutMillis and
> > >> restarted HBase,
> > >>> but the timeout still happens after 60 seconds. Any suggestions?
> > >>>
> > >>> Brian
> > >>
> >
> >
>



-- 
*Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message