hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mickey <huanfeng...@gmail.com>
Subject Re: Is flushOneForGlobalPressure too frequently?
Date Fri, 02 Aug 2013 06:08:53 GMT
Hi, all
Sorry for the bothering. The problem is caused by a mistake in applying
patch.

Best regards,
Mickey


2013/8/1 Mickey <huanfeng388@gmail.com>

> Hi, all
> We met an extremely long gc time problem which make all the region servers
> go down.
> From the region server's log, I found lots of logs as follows:
>
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,223 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=11.2g
> 2013-07-30 19:31:12,223 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=11.2g
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=11.2g
> 2013-07-30 19:31:12,223 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,223 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=11.2g
> 2013-07-30 19:31:12,223 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=11.2g
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=11.2g
> 2013-07-30 19:31:12,223 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
> 2013-07-30 19:31:12,223 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,224 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
> 2013-07-30 19:31:12,224 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=11.2g
> 2013-07-30 19:31:12,224 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for
> region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.,
> flushing=true, writesEnabled=true
> 2013-07-30 19:31:12,224 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> TEST_0,8613786870935_425284498467_53049226674,1373103980810.b5bc4c557faf21d71afc79c4e7f91f05.
> due to global heap pressure
>
> It sometimes produce more than 1GB logs in one minute. And after some
> times, the region server will do a extremely long time gc after promotion
> fail.
>
> In our environment, there is a datanode with bad network condition which
> always lost packages. Region server flushed hfile very slowly when it try
> to flush file to the bad datanode. After remove the datanode, the situation
> turns good now.
>
> My guess: Too may flushOneForGlobalPressure in this situation. It produce
> too many Strings(logs) and other objects.
>
>
> I check the code in the MemStoreFlusher in hbase 0.94 branch(trunk seems
> the same):
>
> public void run() {
>     while (!this.server.isStopped()) {
>       FlushQueueEntry fqe = null;
>       try {
>         wakeupPending.set(false); // allow someone to wake us up again
>         fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>         if (fqe == null || fqe instanceof WakeupFlushThread) {
>           if (isAboveLowWaterMark()) {
>             LOG.debug("Flush thread woke up because memory above low
> water=" +
>
> StringUtils.humanReadableInt(this.globalMemStoreLimitLowMark));
>             if (!flushOneForGlobalPressure()) {
>               // Wasn't able to flush any region, but we're above low
> water mark
>               // This is unlikely to happen, but might happen when closing
> the
>               // entire server - another thread is flushing regions. We'll
> just
>               // sleep a little bit to avoid spinning, and then pretend
> that
>               // we flushed one, so anyone blocked will check again
>               lock.lock();
>               try {
>                 Thread.sleep(1000);
>                 flushOccurred.signalAll();
>               } finally {
>                 lock.unlock();
>               }
>             }
>             // Enqueue another one of these tokens so we'll wake up again
>             wakeupFlushThread();
>           }
>           continue;
>         }
>         FlushRegionEntry fre = (FlushRegionEntry)fqe;
>         if (!flushRegion(fre)) {
>           break;
>         }
>       } catch (InterruptedException ex) {
>         continue;
>       } catch (ConcurrentModificationException ex) {
>         continue;
>       } catch (Exception ex) {
>         LOG.error("Cache flusher failed for entry " + fqe, ex);
>         if (!server.checkFileSystem()) {
>           break;
>         }
>       }
>     }
>     ...
> }
>
> The while loop in this case will quickly finished and then again.
>
> Is my guess reasonable?
> If so, should we do a short sleep in the loop? Or there are some
> configurations we can set?
> I know if the region server is quite busy, it may affect the insert
> operations. In my opinion, blocking the insert operation is more acceptable
> than the server goes down.
>
> Hope to get your suggestions.
>
> Thanks,
> Mickey
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message