eagle-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (EAGLE-971) Duplicated queues are generated under a monitored stream
Date Tue, 28 Mar 2017 12:32:41 GMT

    [ https://issues.apache.org/jira/browse/EAGLE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945052#comment-15945052
] 

ASF GitHub Bot commented on EAGLE-971:
--------------------------------------

GitHub user qingwen220 opened a pull request:

    https://github.com/apache/eagle/pull/895

    EAGLE-971: fix a bug that duplicated queues are generated under a monitored stream

    https://issues.apache.org/jira/browse/EAGLE-971
    
    New policies for alert spec generation
    1. each alert bolt has no more than 'coordinator.policiesPerBolt' policies.
    2. each alert bolt has no more than 'coordinator.streamsPerBolt' queues if 'reuseBoltInStreams'
is true
    3. NO queues on one alert bolt have the same StreamGroup.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/qingwen220/eagle EAGLE-971

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/eagle/pull/895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #895
    
----
commit c4950daa2bd6f1805664fab8593e95d5baaf2531
Author: Zhao, Qingwen <qingwzhao@apache.org>
Date:   2017-03-28T12:27:06Z

    fix a bug that duplicated queues are generated under a monitored stream

----


> Duplicated queues are generated under a monitored stream
> --------------------------------------------------------
>
>                 Key: EAGLE-971
>                 URL: https://issues.apache.org/jira/browse/EAGLE-971
>             Project: Eagle
>          Issue Type: Bug
>    Affects Versions: v0.5.0
>            Reporter: Zhao, Qingwen
>            Assignee: Zhao, Qingwen
>
> This issue is caused by the wrong routing spec generated by the coordinator. 
> Here is the procedure to reproduce it. 
> 1. setting {{policiesPerBolt = 2, streamsPerBolt = 3, reuseBoltInStreams = true}} in
server config
> 2. create four policies which has the same partition and consume the same stream
> {code}
>  from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.rpc.callqueuelength"]#window.length(2)
select site, host, component, metric, min(convert(value, "long")) as minValue group by site,
host, component, metric having minValue >= 10000 insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_CALL_QUEUE_EXCEEDS_OUT;
> from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.rpc.callqueuelength"]#window.length(30)
select site, host, component, metric, min(convert(value, "long")) as minValue group by site,
host, component, metric having minValue >= 10000 insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_CALL_QUEUE_EXCEEDS_OUT;
> from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.hastate.failed.count"]#window.length(2)
select site, host, component, metric, timestamp, min(value) as minValue group by site, host,
component, metric insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_NN_NO_RESPONSE_OUT
> from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.hastate.failed.count.test"]#window.length(3)
select site, host, component, metric, count(value) as cnt group by site, host, component,
metric insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_NN_NO_RESPONSE_OUT;
> {code}
> After creating the four policies, the routing spec is 
> {code}
> routerSpecs: [
> {
> streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
> partition: {
> streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
> type: "GROUPBY",
> columns: [
> "site",
> "host",
> "component",
> "metric"
> ],
> sortSpec: null
> },
> targetQueue: [
> {
> partition: {
> streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
> type: "GROUPBY",
> columns: [
> "site",
> "host",
> "component",
> "metric"
> ],
> sortSpec: null
> },
> workers: [
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt9"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt0"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt1"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt2"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt3"
> }
> ]
> },
> {
> partition: {
> streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
> type: "GROUPBY",
> columns: [
> "site",
> "host",
> "component",
> "metric"
> ],
> sortSpec: null
> },
> workers: [
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt9"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt0"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt1"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt2"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt3"
> }
> ]
> },
> {
> partition: {
> streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
> type: "GROUPBY",
> columns: [
> "site",
> "host",
> "component",
> "metric"
> ],
> sortSpec: null
> },
> workers: [
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt9"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt0"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt1"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt2"
> },
> {
> topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
> boltId: "alertBolt3"
> }
> ]
> }
> ]
> }
> ]
> {code}
> and the alert spec is 
> {code}
> boltPolicyIdsMap: {
> alertBolt9: [
> "NameNodeWithOneNoResponse",
> "NameNodeHAHasNoResponse",
> "CallQueueLengthExceeds30Times",
> "CallQueueLengthExceeds2Times"
> ],
> alertBolt0: [
> "NameNodeWithOneNoResponse",
> "NameNodeHAHasNoResponse",
> "CallQueueLengthExceeds30Times",
> "CallQueueLengthExceeds2Times"
> ],
> alertBolt1: [
> "NameNodeWithOneNoResponse",
> "NameNodeHAHasNoResponse",
> "CallQueueLengthExceeds30Times",
> "CallQueueLengthExceeds2Times"
> ],
> alertBolt2: [
> "NameNodeWithOneNoResponse",
> "NameNodeHAHasNoResponse",
> "CallQueueLengthExceeds30Times",
> "CallQueueLengthExceeds2Times"
> ],
> alertBolt3: [
> "NameNodeWithOneNoResponse",
> "NameNodeHAHasNoResponse",
> "CallQueueLengthExceeds30Times",
> "CallQueueLengthExceeds2Times"
> ]
> }
> {code}
> 3. produce messages into kafka topic 'hadoop_jmx_metrics_sandbox' and trigger NameNodeWithOneNoResponse.
> {code}
> {"timestamp": 1490250963445, "metric": "hadoop.namenode.hastate.failed.count", "component":
"namenode", "site": "artemislvs", "value": 0.0, "host": "localhost"}
> {code}
> Then one message is sent three times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message