hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7670) Synchronized operation in CatalogTracker would block handling ZK Event for long time
Date Sat, 26 Jan 2013 04:47:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563353#comment-13563353
] 

Hadoop QA commented on HBASE-7670:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12566484/HBASE-7670.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 2.0 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.TestZooKeeper
                  org.apache.hadoop.hbase.regionserver.TestPriorityRpc

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//console

This message is automatically generated.
                
> Synchronized operation in CatalogTracker would block handling ZK Event for long time
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-7670
>                 URL: https://issues.apache.org/jira/browse/HBASE-7670
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.4
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-7670.patch
>
>
> We found ZK event not be watched by master for a  long time in our testing.
> It seems one ZK-Event-Handle thread block it.
> Attaching some logs on master
> {code}
> 2013-01-16 22:18:55,667 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, 
> 2013-01-16 22:18:56,270 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, 
> ...
> 2013-01-16 23:55:33,259 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Retrying
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=100,
exceptions:
>         at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:676)
>         at org.apache.hadoop.hbase.catalog.MetaReader.get(MetaReader.java:247)
>         at org.apache.hadoop.hbase.catalog.MetaReader.getRegion(MetaReader.java:349)
>         at org.apache.hadoop.hbase.catalog.MetaReader.readRegionLocation(MetaReader.java:289)
>         at org.apache.hadoop.hbase.catalog.MetaReader.getMetaRegionLocation(MetaReader.java:276)
>         at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:424)
>         at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:489)
>         at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:451)
>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:289)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-01-16 23:55:33,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Attempted
to handle region transition for server but server is not online
> {code}
> Between 2013-01-16 22:18:56 and 2013-01-16 23:55:33, there is no any logs about handling
ZK Event.
> {code}
> this.metaNodeTracker = new MetaNodeTracker(zookeeper, throwableAborter) {
>       public void nodeDeleted(String path) {
>         if (!path.equals(node)) return;
>         ct.resetMetaLocation();
>       }
>     }
> public void resetMetaLocation() {
>     LOG.debug("Current cached META location, " + metaLocation +
>       ", is not valid, resetting");
>     synchronized(this.metaAvailable) {
>       this.metaAvailable.set(false);
>       this.metaAvailable.notifyAll();
>     }
>   }
> private AdminProtocol getMetaServerConnection(){
> synchronized (metaAvailable){
> ...
> ServerName newLocation = MetaReader.getMetaRegionLocation(this);
> ...
> }
> }
> {code}
> From the above code, we would found that nodeDeleted() would wait synchronized (metaAvailable)
until MetaReader.getMetaRegionLocation(this) done,
> however, getMetaRegionLocation() could be retrying for a long time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message