hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Frank Chow" <zhoushuaif...@gmail.com>
Subject Re: Re: bulkLoadHFiles failed because write lock on HRegion
Date Sun, 18 May 2014 16:34:58 GMT
I use 0.94.2, I didn't disable major compaction, and the bulk load performed once a day.
I looked into the 0.98 release code, It seems also have the problem.

The exception came out occasionally. When it occurred, the regionserver log showed that there
is a big compaction on the region. The laggest caused 1 hour.
Try to limit compaction will not solve the problem fundamentally.

Readlock is required during the whole compaction process no matter it's major or not, it's
conflict with the write lock required by the bulkload.
I think it seems a tragedy :(

HRegion:

  public boolean compact(CompactionRequest cr)
  throws IOException {
    if (cr == null) {
      return false;
    }
    if (this.closing.get() || this.closed.get()) {
      LOG.debug("Skipping compaction on " + this + " because closing/closed");
      return false;
    }
    Preconditions.checkArgument(cr.getHRegion().equals(this));
    // block waiting for the lock for compaction
    lock.readLock().lock();
    MonitoredTask status = TaskMonitor.get().createStatus(
        "Compacting " + cr.getStore() + " in " + this);
    try {
      if (this.closed.get()) {
        LOG.debug("Skipping compaction on " + this + " because closed");
        return false;
      }
      boolean decr = true;
      try {
        synchronized (writestate) {
          if (writestate.writesEnabled) {
            ++writestate.compacting;
          } else {
            String msg = "NOT compacting region " + this + ". Writes disabled.";
            LOG.info(msg);
            status.abort(msg);
            decr = false;
            return false;
          }
        }
        LOG.info("Starting compaction on " + cr.getStore() + " in region "
            + this + (cr.getCompactSelection().isOffPeakCompaction()?" as an off-peak compaction":""));
        doRegionCompactionPrep();
        try {
          status.setStatus("Compacting store " + cr.getStore());
          cr.getStore().compact(cr);
        } catch (InterruptedIOException iioe) {
          String msg = "compaction interrupted by user";
          LOG.info(msg, iioe);
          status.abort(msg);
          return false;
        }
      } finally {
        if (decr) {
          synchronized (writestate) {
            --writestate.compacting;
            if (writestate.compacting <= 0) {
              writestate.notifyAll();
            }
          }
        }
      }
      status.markComplete("Compaction complete");
      return true;
    } finally {
      status.cleanup();
      lock.readLock().unlock();
    }
  }


Cheers




From: Ted Yu
Date: 2014-05-18 23:36
To: zhoushuaifeng
CC: dev
Subject: Re: Re: bulkLoadHFiles failed because write lock on HRegion
I am assuming your bulk loading involved multiple column families.


What hbase release are you using ?


Have you disabled major compaction ?


How often is the bulk load performed ?


Cheers



On Sun, May 18, 2014 at 8:17 AM, Frank Chow <zhoushuaifeng@gmail.com> wrote:

Thanks a lot, Ted.
I understand much better about the write lock now. 
But do you have any opinion on how to solve the timeout problem?





From: Ted Yu
Date: 2014-05-18 21:51
To: dev@hbase.apache.org; zhoushuaifeng
Subject: Re: bulkLoadHFiles failed because write lock on HRegion
Please take a look at the following JIRAs: 
HBASE-4552

HBASE-4716



Cheers



On Sat, May 17, 2014 at 11:30 PM, Frank Chow <zhoushuaifeng@gmail.com> wrote:

Hi guys,

Do bulkload Hfiles sometimes failed on my cluster because rpc time out.
I found that when doing bulkload hfiles, it require wirtelock on the region when there are
multi-families. But the compact operation also require read lock on the region. When the compacting
files are big, it need more time to finish (some times cost half an hour or more). So when
the region is compacting, the bulkloadHfiles operation will not be able to get the writelock,
and causing the bulkloading hfiles timeout.

Code is below:
HRegion:
public boolean bulkLoadHFiles(List<Pair<byte[], String>> familyPaths,
      BulkLoadListener bulkLoadListener) throws IOException {
    Preconditions.checkNotNull(familyPaths);
    // we need writeLock for multi-family bulk load
    startBulkRegionOperation(hasMultipleColumnFamilies(familyPaths));

private void startBulkRegionOperation(boolean writeLockNeeded)
      throws NotServingRegionException, RegionTooBusyException, InterruptedIOException {
    if (this.closing.get()) {
      throw new NotServingRegionException(regionInfo.getRegionNameAsString() +
          " is closing");
    }
    if (writeLockNeeded) lock(lock.writeLock());
    else lock(lock.readLock());

My question is: why write lock is needed when there is multi-families ? Can readlock also
work? If it only need a readlock, there will be no conflict and loading hfiles will not timeout.

Thanks.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message