hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
Date Sat, 17 Jan 2015 04:29:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281204#comment-14281204
] 

Hadoop QA commented on HBASE-10499:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12692895/10499-v2.txt
  against master branch at commit 092c91eb0fc2a6b4044183e9ece71dd03711045d.
  ATTACHMENT ID: 12692895

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the total number
of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestFlushRegionEntry

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//artifact/patchprocess/checkstyle-aggregate.html

  Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12498//console

This message is automatically generated.

> In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10499
>                 URL: https://issues.apache.org/jira/browse/HBASE-10499
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: 10499-v2.txt, HBASE-10499.patch, compaction-queue.png, hbase-root-master-ip-10-157-0-229.zip,
hbase-root-regionserver-ip-10-93-128-92.zip, master_4e39.log, master_576f.log, rs_4e39.log,
rs_576f.log, t1.dump, t2.dump, workloada_0.98.dat
>
>
> I got this while testing 0.98RC.  But am not sure if it is specific to this version.
 Doesn't seem so to me.  
> Also it is something similar to HBASE-5312 and HBASE-5568.
> Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. 
In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became
more and the system requested flushes for those many regions.
> One by one everything was flushed except one and that one thing remained unflushed. 
The ripple effect of this on the client side
> {code}
> com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 54 actions: RegionTooBusyException: 54 times,
>         at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
>         at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
>         at com.yahoo.ycsb.ClientThread.run(Client.java:307)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
54 actions: RegionTooBusyException: 54 times,
>         at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
>         at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
>         at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
>         at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
>         at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
>         at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
>         ... 2 more
> {code}
> On one of the RS
> {code}
> 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs:
logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2,
01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4,
9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463,
acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e,
e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69,
0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850,
b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e,
6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5
> ......................................................
> 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs:
logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
> {code}
> {code}
> 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer:
regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f.
after a delay of 16689
> 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer:
regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39.
after a delay of 15868
> 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer:
regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f.
after a delay of 20847
> 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer:
regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39.
after a delay of 20099
> 2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer:
regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f.
after a delay of 8677
> {code}
> {code}
> 2014-02-11 10:31:21,020 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs:
logs=54, maxlogs=32; forcing flush of 1 regions(s): fdbb3242d3b673bbe4790a47bc30576f
> {code}
> I restarted another RS and there were region movements with other regions but this region
stays with the RS that has this issue.  One important observation is that in HRegion.internalflushCache()
we need to add a debug log here
> {code}
> // If nothing to flush, return and avoid logging start/stop flush.
>     if (this.memstoreSize.get() <= 0) {
>       return false;
>     }
> {code}
> Because we can see that the region is requsted for a flush but it does not happen and
no logs related to flush are printed in the logs. so due to some reason this memstore.size()
has become 0( I assume this).  The earlier bugs were also due to similar reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message