phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3062) JMXCacheBuster restarting the metrics system causes PhoenixTracingEndToEndIT to hang
Date Thu, 02 Mar 2017 21:52:45 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893092#comment-15893092
] 

Samarth Jain commented on PHOENIX-3062:
---------------------------------------

Oops, yes. I meant HBASE-16211.

[~jamestaylor] - I think this might be an actual issue. In org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl
we schedule the JMXCacheBuster to clear out the JMXCache every 5 mins. 

{code}
 // Every few mins clean the JMX cache.
    executor.getExecutor().scheduleWithFixedDelay(new Runnable() {
      public void run() {
        JmxCacheBuster.clearJmxCache();
      }
    }, 5, 5, TimeUnit.MINUTES);
{code}

Before HBASE-16211, the JMXCacheBuster.clearJMXCache() would simply restart (!!) the entire
metrics system. 

{code}
try {
        if (DefaultMetricsSystem.instance() != null) {
          DefaultMetricsSystem.instance().stop();
          // Sleep some time so that the rest of the hadoop metrics
          // system knows that things are done
          Thread.sleep(500);
          DefaultMetricsSystem.instance().start();
        }
      }  catch (Exception exception)  {
        LOG.debug("error clearing the jmx it appears the metrics system hasn't been started",
            exception);
      }
{code}

Stopping the metrics system internally stops all the sinks and clears out list in which it
maintains references of those sinks.

{code}
private synchronized void stopSinks() {
    for (Entry<String, MetricsSinkAdapter> entry : sinks.entrySet()) {
      MetricsSinkAdapter sa = entry.getValue();
      LOG.debug("Stopping metrics sink "+ entry.getKey() +
          ": class=" + sa.sink().getClass());
      sa.stop();
    }
    sinks.clear();
  }
{code}

Which means the start() method in the MetricsSystem doesn't know which sinks it should be
re-registering. So even if PhoenixMetricsSink was registered, after no later than 5 mins,
it would be removed by the JMXCacheBuster via MetricsRegionAggregateSourceImpl making tracing
unusable. 

I am not too sure how classes like MetricsRegionAggregateSourceImpl, MetricsReplicationSourceSourceImpl
are used. I am guessing they have to do with publishing various internal hbase metrics via
JMX. 

Probably [~elserj] or [~enis] would know? 


> JMXCacheBuster restarting the metrics system causes PhoenixTracingEndToEndIT to hang
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3062
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3062
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 4.10.0
>
>         Attachments: phoenix-3062_v1.patch
>
>
> With some recent fixes in the hbase metrics system, we are now affectively restarting
the metrics system (in HBase-1.3.0, probably not affecting 1.2.0). Since we use a custom sink
in the PhoenixTracingEndToEndIT, restarting the metrics system loses the registered sink thus
causing a hang. 
> We need a fix in HBase, and Phoenix so that we will not restart the metrics during tests.

> Thanks to [~sergey.soldatov] for analyzing the initial root cause of the hang. 
> See HBASE-14166 and others. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message