hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Jeltema <brian.jelt...@digitalenvoy.net>
Subject Re: periodicFlusher get stuck
Date Tue, 24 Feb 2015 16:42:34 GMT
I should have mentioned that the timeout is fixed by killing the region server that owns the
region in question.

I’ve restarted the cluster, so all the ‘bad state’ is gone.

hbase.regionserver.optionalcacheflushinterval is not defined, so it is the default. These
periodic
messages can go on for days. 


hbase-site.xml (slightly edited to replace node names):

<!--Tue Jul 22 11:37:35 2014-->
    <configuration>
    
    <property>
      <name>hbase.regionserver.global.memstore.upperLimit</name>
      <value>0.4</value>
    </property>
    
    <property>
      <name>hbase.master.balancer.stochastic.tableSkewCost</name>
      <value>100</value>
    </property>
    
    <property>
      <name>hbase.hstore.flush.retries.number</name>
      <value>120</value>
    </property>
    
    <property>
      <name>hbase.client.keyvalue.maxsize</name>
      <value>10485760</value>
    </property>
    
    <property>
      <name>hbase.tmp.dir</name>
      <value>/hdfs-1/hadoop/hbase</value>
    </property>
    
    <property>
      <name>hbase.hstore.compactionThreshold</name>
      <value>3</value>
    </property>
    
    <property>
      <name>hbase.snapshot.master.timeoutMillis</name>
      <value>120000</value>
    </property>
    
    <property>
      <name>hbase.security.authentication</name>
      <value>simple</value>
    </property>
    
    <property>
      <name>hbase.hregion.max.filesize</name>
      <value>10737418240</value>
    </property>
    
    <property>
      <name>hfile.block.cache.size</name>
      <value>0.40</value>
    </property>
    
    <property>
      <name>hbase.defaults.for.version.skip</name>
      <value>true</value>
    </property>
    
    <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2181</value>
    </property>
    
    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>node1,node2,node3</value>
    </property>
    
    <property>
      <name>hbase.regionserver.handler.count</name>
      <value>60</value>
    </property>
    
    <property>
      <name>zookeeper.znode.parent</name>
      <value>/hbase-unsecure</value>
    </property>
    
    <property>
      <name>hbase.hstore.blockingStoreFiles</name>
      <value>10</value>
    </property>
    
    <property>
      <name>hbase.hregion.majorcompaction</name>
      <value>86400000</value>
    </property>
    
    <property>
      <name>hbase.regionserver.global.memstore.lowerLimit</name>
      <value>0.38</value>
    </property>
    
    <property>
      <name>hbase.security.authorization</name>
      <value>false</value>
    </property>
    
    <property>
      <name>hbase.hregion.memstore.block.multiplier</name>
      <value>2</value>
    </property>
    
    <property>
      <name>hbase.hregion.memstore.flush.size</name>
      <value>134217728</value>
    </property>
    
    <property>
      <name>hbase.superuser</name>
      <value>hbase</value>
    </property>
    
    <property>
      <name>hbase.rootdir</name>
      <value>hdfs://node1:8020/apps/hbase/data</value>
    </property>
    
    <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
    </property>
    
    <property>
      <name>hbase.hregion.memstore.mslab.enabled</name>
      <value>true</value>
    </property>
    
    <property>
      <name>hbase.client.scanner.caching</name>
      <value>100</value>
    </property>
    
    <property>
      <name>hbase.zookeeper.useMulti</name>
      <value>true</value>
    </property>
    
    <property>
      <name>zookeeper.session.timeout</name>
      <value>30000</value>
    </property>
    
  </configuration>


On Feb 24, 2015, at 11:28 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org> wrote:

> Interesting...
> 
> Can you share you hbase-site.xml? Have you setup
> hbase.regionserver.optionalcacheflushinterval?
> 
> Can you hadoop fs -ls -R this region folder?
> 
> 2015-02-24 11:15 GMT-05:00 Brian Jeltema <brian.jeltema@digitalenvoy.net>:
> 
>> I’m seeing occasional HBase log output similar to the output shown below.
>> It appears there is a request to flush a region, repeated every 10
>> seconds, that apparently is never being performed. It’s causing MR jobs to
>> timeout because they cannot write to this region. Is this a known problem?
>> 
>> hbase version 0.98.0.2.1.2.1-471-hadoop2
>> hadoop version 2.4.0.2.1.2.1-471
>> 
>> 
>> 2015-02-23 14:51:47,612 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1415044750009.6ec50faa43a312cd6465d991e5984ec6. after a
>> delay of 13758
>> 2015-02-23 14:51:57,611 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1415044750009.6ec50faa43a312cd6465d991e5984ec6. after a
>> delay of 18080
>> 2015-02-23 14:52:07,611 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1415044750009.6ec50faa43a312cd6465d991e5984ec6. after a
>> delay of 17701
>> 2015-02-23 14:52:17,612 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1415044750009.6ec50faa43a312cd6465d991e5984ec6. after a
>> delay of 19090
>> 2015-02-23 14:52:27,616 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1415044750009.6ec50faa43a312cd6465d991e5984ec6. after a
>> delay of 4042
>> 2015-02-23 14:52:37,615 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1415044750009.6ec50faa43a312cd6465d991e5984ec6. after a
>> delay of 12968
>> 2015-02-23 18:12:03,307 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1424724136146.48d4d3fa0e02a97a8a1d9b85d5cf0162. after a
>> delay of 10482
>> 2015-02-23 18:12:13,308 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1424724136146.48d4d3fa0e02a97a8a1d9b85d5cf0162. after a
>> delay of 14829
>> 2015-02-23 19:15:13,330 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1424724136146.48d4d3fa0e02a97a8a1d9b85d5cf0162. after a
>> delay of 22888
>> 2015-02-23 19:15:23,329 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1424724136146.48d4d3fa0e02a97a8a1d9b85d5cf0162. after a
>> delay of 21081
>> 2015-02-23 19:15:33,329 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1424724136146.48d4d3fa0e02a97a8a1d9b85d5cf0162. after a
>> delay of 6387
>> 2015-02-23 20:50:23,368 INFO  [regionserver60020.periodicFlusher]
>> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
>> flush for region
>> Host,\x00_m\xB8\x06,1424724136146.48d4d3fa0e02a97a8a1d9b85d5cf0162. after a
>> delay of 8828
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message