www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christofer Dutz <christofer.d...@c-ware.de>
Subject AW: Tool proposal for helping run and monitor the ASF Infra Services
Date Wed, 31 Aug 2016 08:48:31 GMT
Hi Guys


Ok ... turned out that it was just Java VMs being started and killed a few milliseconds later.

I guess it doesn't really make sense to monitor the Java VMs on a CI Server like that ;-)


I updated the config and re-started it ... also I started the agent on my private server (Ubuntu)
so you can see a little more data.


I'd now try to setup a puppet script for the instana-agent and create a pull request for that.


Anyone else here interested in an account? I already invited Chris L. ... would be happy to
add more.

Another thing if you are signed in, and go to User -> Settings and disable the checkbox:
"Disable Unmonitored Hosts". After that Instana will show connections from the server to other
unmonitored hosts.


Chris

________________________________
Von: Christofer Dutz <christofer.dutz@c-ware.de>
Gesendet: Montag, 29. August 2016 10:28:27
An: infrastructure-dev@apache.org
Betreff: AW: Tool proposal for helping run and monitor the ASF Infra Services

Ok so it seems that this was just some log output that shouldn't interfere with the build.
Having a look at the build, the output is the same as without the agent. The only difference
was the text output inbetween and I agree this should not happen. I forwarded the issue to
Instana and the guys there are investigating the issue.


But I'll keep the agent off for now ... don't want to produce confusion.


Chris

________________________________
Von: Christofer Dutz <christofer.dutz@c-ware.de>
Gesendet: Montag, 29. August 2016 09:43:38
An: infrastructure-dev@apache.org
Betreff: AW: Tool proposal for helping run and monitor the ASF Infra Services

I turned off the agent immediately ... things should be back to normal now.


Sorry for that. I forwarded the information to the Instana guys.


Chris

________________________________
Von: sebb <sebbaz@gmail.com>
Gesendet: Sonntag, 28. August 2016 14:25:37
An: infrastructure-dev@apache.org
Betreff: Re: Tool proposal for helping run and monitor the ASF Infra Services

The installation seems to be causing problems for Windows builds, see
for example:

https://builds.apache.org/job/JMeter-Windows/66/console

Sample error message below.

Please can this be fixed?

[...truncated 1095 lines...]
        at com.instana.agent.loader.v1_2_22.AgentLoader.setupAgent(AgentLoader.java:76)
        at com.instana.agent.loader.v1_2_22.AgentLoader.agentmain(AgentLoader.java:60)
        at com.instana.agent.loader.v1_2_22.AgentLoader.agentmain(AgentLoader.java:43)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:382)
        at sun.instrument.InstrumentationImpl.loadClassAndCallAgentmain(InstrumentationImpl.java:407)
Could not start Instana agent:
C:\Users\CHRIST~1\AppData\Local\Temp\instana-instrumentation-boot-1.1.11.jar
(The system cannot find the file specified)
java.io.FileNotFoundException:
C:\Users\CHRIST~1\AppData\Local\Temp\instana-instrumentation-boot-1.1.11.jar
(The system cannot find the file specified)
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:214)
        at java.util.zip.ZipFile.<init>(ZipFile.java:144)
        at java.util.jar.JarFile.<init>(JarFile.java:152)
        at java.util.jar.JarFile.<init>(JarFile.java:131)
        at com.instana.agent.loader.v1_2_22.AgentLoader.setupAgent(AgentLoader.java:76)
        at com.instana.agent.loader.v1_2_22.AgentLoader.agentmain(AgentLoader.java:60)
        at com.instana.agent.loader.v1_2_22.AgentLoader.agentmain(AgentLoader.java:43)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:382)
        at sun.instrument.InstrumentationImpl.loadClassAndCallAgentmain(InstrumentationImpl.java:407)
   [client] summary +    132 in 00:00:21 =    6.2/s Avg:   139 Min:
 1 Max:   255 Err:    16 (12.12%) Active: 0 Started: 11 Finished: 11

On 27 August 2016 at 18:29, Christofer Dutz <christofer.dutz@c-ware.de> wrote:
> Hi Chris,
>
>
> Ok so I started the Agent ... the Windows build agent is rather unexciting as it only
has a hand full of java processes and a MsSQL Server with all network shut down. When encountering
software, that is explicitly instrumented such as an Apache Webserver, Tomcat, the introspection
will provide more insight.
>
>
> Eventually I could whip up a puppet script which you could use to setup the Instana Agent
on a linux machine. Then you should be able to setup an agent on one of the more interesting
machines. Are there any test VMs I could test such a script on?
>
>
> I just sent you (Chris L.) an invitation to the Instana Web Console, so feel free to
sign up. If anyone else is interested, I'll be glad to invite him/her too :-)
>
>
> The "Datacenter A / Rack 42" is just a dummy as I don't have any clue about what datacenter
the machine/vm is running on.
>
>
> Chris
>
> ________________________________
> Von: Chris Lambertus <cml@apache.org>
> Gesendet: Samstag, 27. August 2016 08:20:15
> An: infrastructure-dev@apache.org
> Cc: Christofer Dutz
> Betreff: Re: Tool proposal for helping run and monitor the ASF Infra Services
>
> Chris,
>
> We’re interested, but we don’t really have the bandwidth to set up evaluations for
tools like this right now. If you truly believe it’s worthwhile, then go ahead and fire
up the agent on hudson-win and let’s see what we get out of it. If it doesn’t require
any elevated permissions, I’m happy to see a live demo.
>
>
>
>
> On Aug 23, 2016, at 11:46 PM, Christofer Dutz <christofer.dutz@c-ware.de<mailto:christofer.dutz@c-ware.de>>
wrote:
>
> Hi all,
>
>
> How about trying it out for one chain of services? I bet that would explain a lot more
than anything else? As I mentioned, the SAAS account is already setup and I can create some
user accounts for you guys.
>
>
> On the windows1 machine (hudson-win.apache.org<http://hudson-win.apache.org/> in
directory "f:\instana-agent") I have the agent installed (but inactive) so you could have
a look at the agent itself.
>
>
> We could fire up the agent and start a Jenkins build and it will definitely show us communication
with the jenkins machine, repository.apache.org<http://repository.apache.org/> and the
proxy (Well probably with the proxy). So if we setup the agent on windows, the proxy and repository.apache.org<http://repository.apache.org/>
I guess this should give a first impression. And if you don't like or want it in the end,
just stop and delete the agent and no harm was done. What do you think?
>
>
> I would be glad to assist you in this.
>
>
> Chris
>
> ________________________________
> Von: Mirko Novakovic <mirko.novakovic@codecentric.de<mailto:mirko.novakovic@codecentric.de>>
> Gesendet: Dienstag, 23. August 2016 16:04:52
> An: infrastructure-dev@apache.org<mailto:infrastructure-dev@apache.org>
> Betreff: Re: AW: Tool proposal for helping run and monitor the ASF Infra Services
>
> Hi Daniel,
>
> I am the Instana CEO and happy to answer your questions. First of all I would like to
point out that we are an APM company/product, so more in the space of NewRelic than DataDog.
But as we are monitoring the application as a whole (and model a dynamic dependency graph
for it) we also monitor the infrastructure. And have out of the box dashboards etc but the
core is our distributed tracing and service quality management - all automatic and without
configuration.
>
> Regarding the questions:
>
> - I can guarantee that there is no time limit on this offering for Apache - the only
thing we should discuss is the number of components you would monitor, as it is SAAS there
should be some maximum defined so that the costs for us do not explode. If you want to have
Instana On Premise we wouldn’t need that limit.
>
> - Our whole system is streaming based, so we stream the data form the agent to the server
and from the server the data is pushed to the UI. On the backend we build a graph and apply
a knowledge base on each component to find changes and issues in realtime. Based on service
KPIs (Four Golden Signals by Google) we identify if the changes/issues are having impact on
the service quality - if yes we report. The algorithms are based on machine learning, so we
see things like sudden drops, slow responses, high error rates, etc etc
>
> - Don’t know Circonus in details, so I cannot really compare it. Instana for sure is
not time consuming as it was designed for auto discovery and zero configuration.
>
> - We have different types of integration using API and SDK. For alarms we have out of
the box integrations for PagerDuty, OpsGenie and Slack, but also provide a generic Webhook
to integrated with any other system that has an API.
>
> - The agent is Apache Karaf based, can update itself using a private or public sensor
(plugin) repository that is based on Apache Maven technology. We provide a set of sensors
for approx 40 technologies right now but enable the users to extend this with own sensors.
>
> We can give you more insights or a demo at any time.
>
> Best regards
> Mirko
>
>
> On 2016-08-19 09:16 (+0200), Daniel Gruno <h...@apache.org<http://apache.org>>
wrote:
> On 08/19/2016 08:20 AM, Christofer Dutz wrote:>
> Hi Chris,>
>
> I knew that someone asked exactly the "how does it compare to datadog" question somewhere.
Here's the link to that mail thread https://news.ycombinator.com/item?id=12147219>
>
> And I can confirm the shortcomings of the time series approach, cause in jenkins, I'd
say about 70% of recent failures of flex builds were due to timeouts when uploading Maven
artifacts to nexus. The current solution doesn't seem to detect that. Not only that I couldn't
see any hipchat notifications. The infra guys always had to start looking for the real reason
of the timeouts as nexus wasn't having any problems at all.>
>
> I'd say the real reason we weren't being notified is because httpd>
> wasn't telling us it was jammed, so it would have made little difference>
> whether we used product X or Y to monitor it, when it wasn't sending out>
> data that could lead us to what the problem was. It wasn't a question of>
> granularity, the data just wasn't enabled for any agent to see.>
>
> I would imagine this would be something _instead_ of datadog, which>
> leads me to some questions:>
>
> - For how long into the future could we have a guarantee that this isn't>
> gonna cost us $$$/year or whatever the price would end up being with a>
> non-free version?>
>
> - What is understood by real time checks here? and what exactly is checked?>
>
> - How does this relate to more advanced monitoring systems like>
> Circonus? As you may know, we had to drop that as it proved to be rather>
> time consuming.>
>
> - What sort of integrations does this system have? How are alerts>
> dispatched?>
>
> I'd also be interested in learning about plugins and how to customize>
> the agents.>
>
> With regards,>
> Daniel.>
>
>
> And I really live the feature of tracking down the response time for one service back
to other servers to find it the real reason for a system being slow (have a look in the presentation
for this. There's a great slide on this)>
>
> Chris>
>
>
> Von meinem Samsung Galaxy Smartphone gesendet.>
>
>
> -------- Ursprngliche Nachricht -------->
> Von: Chris Lambertus <cm...@apache.org<http://apache.org>>>
> Datum: 19.08.16 07:03 (GMT+01:00)>
> An: infrastructure-dev@apache.org<mailto:infrastructure-dev@apache.org>, Christofer
Dutz <ch...@c-ware.de<http://c-ware.de>>>
> Cc: mirko.novakovic@codecentric.de<mailto:mirko.novakovic@codecentric.de>>
> Betreff: Re: Tool proposal for helping run and monitor the ASF Infra Services>
>
>
>
> Hiya Chris,>
>
> Thanks for the info and the legwork on this. We currently use DataDog, which is very
similar to what Instana appears to provide an agent-based monitoring solution that gives us
that kind of look into our infra. We also have a number of internal tools that report on various
goings-on as well. You might see some of this in #asfinfra on hipchat from SNMP2HipChat. DataDog
also reports various problems there, as does our monitoring via PingMyBox.>
>
> Since youre not root@, you may not see some of the stuff that we see, but I think by
and large, the majority of the monitors do direct to #asfinfra. Have you noticed gaps in the
monitoring? Since we moved to DataDog, weve been quite happy with the resolution and metrics
weve been able to get. Its been on my back burner for awhile to expose some of our DD dashboards
as public, but for right now its somewhat limited access. In the interests of transparency
(but not at the expense of security,) Id be happy to work with you to expose more of this,
and Im happy to address any questions or concerns about shortcomings in our monitoring.>
>
> Many thanks to Instana for offering the ASF free services! Id definitely like to hear
more about what they might be able to offer on top of what we already get from DataDog. Ill
take a look at the info you sent out. Please feel free to follow up with me directly, either
via email or hipchat.>
>
> Cheers,>
> -Chris>
>
>
>
>
> On Aug 18, 2016, at 2:13 AM, Christofer Dutz <ch...@c-ware.de<http://c-ware.de>>
wrote:>
>
> Hi,>
>
>
>
> I have been on the Infra Hipchat for a few weeks now while trying to migrate the Flex
project to Maven and back to the ASF Infra build system. Thanks for your support in this and
even more thanks for the trust in granting me access and Admin rights on the windows1 build
agent.>
>
>
>
> In the chat I observed the daily work of you guys, having to maintain quite a zoo of
all sorts of different systems on different platforms. Some problems you were having seem
quite easy to track down ... if the hard disk is full, you clean up. But not all problems
are that easy to track down. Thinking of the problems with repository.apache.org<http://repository.apache.org>
... here the cause was the proxy being flooded with connections (I think this was the case)
... regular restarts of this helped temporarily, but I don't think that helps on the long
term as no one had an idea why those connections were hanging there in the first place.>
>
>
>
> A few years ago the company I work for - codecentric - have founded a company called
Instana. They are developing an agent based system for monitoring IT infrastructure. In contrast
to most established solutions, they use machine learning strategies to analyze the root cause
for problems. While you can probably achieve similar results with normal tools, the problem
is that you need a very detailed domain knowledge to do so and in a regularly changing environment
you need to continuously keep adjusting your metrics. Instana does this automatically. I think
you can imagine how tricky it is to follow the root cause for bad response times through a
network of interconnected services.>
>
>
>
> Investing almost all of my free time (and a lot of my paid time) for Apache, noticing
a lot of the problems you have to deal with every day, I asked Instana if they would be willing
to provide their service to the ASF for free and they agreed and immediately setup a dedicated
instance.>
>
>
>
> I wanted to try the thing out as I would prefer to grab a few beers with you at ApcheCon
in Cevillia and not get punched in the face for recommending something bad ;-) ... so I tried
this on my private Server playground. I unpacked and started the agent and the host appeared
on the web console and reported the problems it was having (ones I didn't even know about)
as well as other systems it communicates with ... as soon as I added agents on these machines
the analytics started doing their work across system and I built up a map view of my services
and their correlation. So it's really a system that needs almost no configuration at all :-)>
>
>
> I uploaded the internal product presentation here: https://public.centerdevice.de/1a9dc4ed-515e-482e-9fd6-6d60a5562598
(please don't share this outside of the ASF)>
>
> Please use the password: 4p4cheR0cks (I'll remove that document in about two weeks)>
>
>
> By the way ... the screenshots in the presentation are real ... I was amazed of seeing
a 3D web UI in production for the first time ;-)>
>
>
>
> So if there is any interest in this offer, I would be more than happy to provide credentials
to you and assist you in getting started, so you could easily try it out. The guys at Instana
would also be delighted to give you guys an online demo and answer any questions you might
be having. Feel free to conatact Mirco directly for this: mirko.novakovic@codecentric.de<mailto:mirko.novakovic@codecentric.de>>
>
>
>
> Chris>
>
>
>
>
>
> Mirko Novakovic | Vorstand
>
> codecentric AG | Merscheider Straße 1 | 42699 Solingen | Deutschland
> tel: +49 (0) 212.23362811 | fax: +49 (0) 212.23362879 | mobil: +49 (0) 163.6681500
> www.codecentric.de<http://www.codecentric.de/><http://www.codecentric.de<http://www.codecentric.de/>>
| blog.codecentric.de<http://blog.codecentric.de/> | www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/<http://www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/<http://www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/<http://www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/<http://www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/<http://www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/<http://www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/<http://www.meettheexperts.de<http://www.meettheexperts.de/><http://www.meettheexperts.de<http://www.meettheexperts.de/>>>>>
| www.more4fi.de<http://www.more4fi.de/><http://www.more4fi.de<http://www.more4fi.de/>>
>
> Sitz der Gesellschaft: Solingen | HRB 25917 | Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder
rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen
Sie diese E-Mail und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen
oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht
gestattet.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message