hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: snapshot timeout problem
Date Tue, 22 Jul 2014 13:18:25 GMT
The load balancer in 0.98 considers many factors when making balancing decisions. 

Can you take a look at the master log and look for balancer related lines ?
That would give you some clue. 

Cheers

On Jul 22, 2014, at 5:03 AM, Brian Jeltema <brian.jeltema@digitalenvoy.net> wrote:

> I ran the balancer from hbase shell, but don’t see any change. Is there a way to balance
a specific table?
> 
>> bq. One RegionServer has 69 regions
>> 
>> Can you run load balancer so that your regions are better balanced ?
>> 
>> Cheers
>> 
>> 
>> On Mon, Jul 21, 2014 at 6:56 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>> 
>>> There are 174 regions, not well balanced. One RegionServer has 69 regions.
>>> That RegionServer generates a
>>> series of log entries (modified and shown below), one for each region, at
>>> roughly 1 to 2 second intervals. The timeout period expires when
>>> it reaches region 36.
>>> 
>>> 2014-07-21 07:49:44,503 regionserver.HRegion: Creating references for
>>> hfiles
>>> 2014-07-21 07:49:44,503 regionserver.HRegion: Adding snapshot references
>>> for [hdfs://
>>> xxx.digitalenvoy.net:8020/apps/hbase/data/data/default/hosts/31e2a098e9e311c4ddcfd3d8da28dfb6/p/3749b6df36c749508fe9c6f54ca425f2]
>>> hfiles
>>> 2014-07-21 07:49:44,503 regionserver.HRegion: Creating reference for file
>>> (1/1) : hdfs://
>>> xxx.digitalenvoy.net:8020/apps/hbase/data/data/default/hosts/31e2a098e9e311c4ddcfd3d8da28dfb6/p/3749b6df36c749508fe9c6f54ca425f2
>>> 2014-07-21 07:49:45,136 snapshot.FlushSnapshotSubprocedure: ... Flush
>>> Snapshotting region
>>> hosts,\x00\x03|\xBF!,1400600029600.31e2a098e9e311c4ddcfd3d8da28dfb6.
>>> completed.
>>> 2014-07-21 07:49:45,137 snapshot.FlushSnapshotSubprocedure: Closing region
>>> operation on
>>> hosts,\x00\x03|\xBF!,1400600029600.31e2a098e9e311c4ddcfd3d8da28dfb6.2014-07-21
>>> 07:49:45,137 DEBUG [rs(xxx.digitalenvoy.net,60020,1405943192177)-snapshot-pool3-thread-1]
>>> snapshot.FlushSnapshotSubprocedure: Starting region operation on
>>> hosts,\x00\x8A\x90\xD6\x08,1400
>>> 659179080.a74402fcbd9a96a7c92b250721095729.2014-07-21 07:49:45,137 DEBUG
>>> [member: ‘xxx.digitalenvoy.net,60020,1405943192177'
>>> subprocedure-pool1-thread-2] snapshot.RegionServerSnapshotManager:
>>> Completed 1/174 local region snapshots.
>>> 2014-07-21 07:49:45,137 snapshot.FlushSnapshotSubprocedure: Flush
>>> Snapshotting region
>>> hosts,\x00\x8A\x90\xD6\x08,1400659179080.a74402fcbd9a96a7c92b250721095729.
>>> started...
>>> 2014-07-21 07:49:45,137 regionserver.HRegion: Storing region-info for
>>> snapshot.
>>> 
>>> On Jul 21, 2014, at 9:21 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
>>> wrote:
>>> 
>>>> Can you also tell us more about your table? How many regions on how many
>>>> region servers?
>>>> 
>>>> 
>>>> 2014-07-21 8:23 GMT-04:00 Ted Yu <yuzhihong@gmail.com>:
>>>> 
>>>>> Normally such timeout is caused by one region server which is slow in
>>>>> completing its part of the snapshot procedure.
>>>>> 
>>>>> Have you looked at region server logs ?
>>>>> Feel free to pastebin relevant portion.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> On Jul 21, 2014, at 4:03 AM, Brian Jeltema <
>>> brian.jeltema@digitalenvoy.net>
>>>>> wrote:
>>>>> 
>>>>>> I’m running HBase 0.98. I’m trying to snapshot a table, but it’s
timing
>>>>> out after 60 seconds.
>>>>>> I increased the value of hbase.snapshot.master.timeoutMillis and
>>>>> restarted HBase,
>>>>>> but the timeout still happens after 60 seconds. Any suggestions?
>>>>>> 
>>>>>> Brian
> 

Mime
View raw message