hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangduo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction
Date Sat, 21 Mar 2015 09:18:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372611#comment-14372611
] 

zhangduo commented on HBASE-13308:
----------------------------------

This is our 'compactAndBlockUntilDone' method.
{code:title=TestEndToEndSplitTransaction.java}
  public static void compactAndBlockUntilDone(Admin admin, HRegionServer rs, byte[] regionName)
      throws IOException, InterruptedException {
    log("Compacting region: " + Bytes.toStringBinary(regionName));
    admin.majorCompactRegion(regionName);
    log("blocking until compaction is complete: " + Bytes.toStringBinary(regionName));
    Threads.sleepWithoutInterrupt(500);
    while (rs.compactSplitThread.getCompactionQueueSize() > 0) {
      Threads.sleep(50);
    }
  }
{code}

It uses the thread pool's workQueue size as condition. But
{code}
  public static void main(String[] args) throws InterruptedException {
    ThreadPoolExecutor pool =
        new ThreadPoolExecutor(1, 1, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>());
    pool.execute(new Runnable() {
      
      @Override
      public void run() {
        try {
          Thread.currentThread().join();
        } catch (InterruptedException e) {}
      }
    });
    Thread.sleep(2000);
    System.out.println(pool.getActiveCount());
    System.out.println(pool.getQueue().size());
    pool.shutdownNow();
  }
{code}
The output is 
{noformat}
1
0
{noformat}
A thread pool's queue size does not include the running tasks. So if there is only one running
compaction, then the compaction queue size will be zero...

So, it is not safe to use compaction queue size as condition.

> Fix flaky TestEndToEndSplitTransaction
> --------------------------------------
>
>                 Key: HBASE-13308
>                 URL: https://issues.apache.org/jira/browse/HBASE-13308
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: zhangduo
>            Assignee: zhangduo
>
> https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/
> First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  'abe1973ea732066b12d8e33fce12a951'
and '4940dad7ef9b4b699fd13eede5740d9d'.
> And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
> {noformat}
> 2015-03-21 03:58:46,970 INFO  [Thread-191] regionserver.TestEndToEndSplitTransaction(399):
Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
> 2015-03-21 03:58:46,976 INFO  [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596):
Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
> 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259):
Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
 compaction_queue=(0:0), split_queue=1, merge_queue=0
> 2015-03-21 03:58:46,978 INFO  [Thread-191] regionserver.TestEndToEndSplitTransaction(399):
blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
> 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226):
Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:541770000000002
> 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328):
Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:541770000000002
> 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event
loop group NioEventLoopGroup
> 2015-03-21 03:58:46,988 INFO  [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142):
Split transaction journal:
> 	STARTED at 1426910326977
> {noformat}
> We can see that it failed without any error message.
> I think can only happen when the parent is not splittable or we can not find a splitrow.
> {noformat}
> 2015-03-21 03:58:47,019 INFO  [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334):
Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection
was in queue for 0sec, and took 0sec to execute.
> 2015-03-21 03:58:47,019 INFO  [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523):
Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec
> 2015-03-21 03:58:47,020 DEBUG [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(546):
CompactSplitThread Status: compaction_queue=(0:0), split_queue=0, merge_queue=0
> {noformat}
> We can see that, the compaction was completed at 03:58:47,019, but split was started
at 03:58:46,970 which is earlier.
> So we have a reference file and is not splittable.
> I think the problem is 'compactAndBlockUntilDone' is not reliable, it may return before
the compaction complete.
> Will try to prepare a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message