spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-2769) Ganglia Support Broken / Not working
Date Sun, 07 Sep 2014 00:17:28 GMT

     [ https://issues.apache.org/jira/browse/SPARK-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Wendell updated SPARK-2769:
-----------------------------------
    Affects Version/s:     (was: 1.0.0)
                       1.1.0

> Ganglia Support Broken / Not working
> ------------------------------------
>
>                 Key: SPARK-2769
>                 URL: https://issues.apache.org/jira/browse/SPARK-2769
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>         Environment: Linux Red Hat 6.4 on Spark 1.1.0
>            Reporter: Stephen Walsh
>              Labels: Ganglia, GraphiteSink,, Metrics
>
> Hi all,
> I've build spark 1.1.0 with sbt with ganglia enabled and hadoop version 2.4.0
> No issues there, spark works fine on hadoop 2.4.0 and ganglia (GraphiteSink) is installed.
> I've added the following to the metrics.properties
> *.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
> *.sink.graphite.host=HOSTNAME
> *.sink.graphite.port=8649
> *.sink.graphite.period=1
> *.sink.graphite.prefix=aa
> and I get this error message
> 14/07/31 05:39:00 WARN graphite.GraphiteReporter: Unable to report to Graphite
> java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>         at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
>         at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:254)
>         at com.codahale.metrics.graphite.Graphite.send(Graphite.java:77)
>         at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:254)
>         at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:156)
>         at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:107)
>         at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:86)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> From looking at the code I see the following.
>   val graphite: Graphite = new Graphite(new InetSocketAddress(host, port))
>   val reporter: GraphiteReporter = GraphiteReporter.forRegistry(registry)
>       .convertDurationsTo(TimeUnit.MILLISECONDS)
>       .convertRatesTo(TimeUnit.SECONDS)
>       .prefixedWith(prefix)
>       .build(graphite)
> https://github.com/apache/spark/blob/87bd1f9ef7d547ee54a8a83214b45462e0751efb/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala#L69
> Followed by
> override def start() {
>     reporter.start(pollPeriod, pollUnit)
>   }
> I noticed that the error fails when we first fry to send a message but nowhere do I see
 graphite.connect() being called?
> https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java#L62
> as it seems to fail on the send function..
> https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java#L77
> a with "this.writer" not initialized the "writer.write" will fail.
> The GraphiteBuilder doesn't call it either when creating the "reporter" object.
> https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/GraphiteReporter.java#L113
> Maybe I'm looking in the wrong area and I'm passing in the wrong values - but very little
logging has me thinking it is a bug.
> EDIT:
> found out where the connect gets called.
> https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/GraphiteReporter.java#L153
> ad his is called from  here
> https://github.com/dropwizard/metrics/blob/99dc540c2cbe6bb3be304e20449fb641c7f5382a/metrics-core/src/main/java/com/codahale/metrics/ScheduledReporter.java#L98
> which is called form here
> https://github.com/dropwizard/metrics/blob/99dc540c2cbe6bb3be304e20449fb641c7f5382a/metrics-core/src/main/java/com/codahale/metrics/ScheduledReporter.java#L98
> but the issue still stands. :/
> Edit 2:
> my ports are open and listening
> [root@rtr-dev-spark4 ~]# lsof -i :8649
> COMMAND   PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> gmond   32173 ganglia    5u  IPv4 3480253      0t0  UDP rtr-dev-spark4.ord2012:8649
> gmond   32173 ganglia    6u  IPv4 3480255      0t0  TCP rtr-dev-spark4.ord2012:8649 (LISTEN)
> gmond   32173 ganglia    7u  IPv4 3480257      0t0  UDP rtr-dev-spark4.ord2012:55523->rtr-dev-spark4.ord2012:8649
> Regards
> Steve



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message