Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B3515200B70 for ; Sat, 27 Aug 2016 21:07:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B1AAF160AB0; Sat, 27 Aug 2016 19:07:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7FCFE160A93 for ; Sat, 27 Aug 2016 21:07:42 +0200 (CEST) Received: (qmail 52033 invoked by uid 500); 27 Aug 2016 19:07:41 -0000 Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: infrastructure-dev@apache.org Delivered-To: mailing list infrastructure-dev@apache.org Received: (qmail 52016 invoked by uid 99); 27 Aug 2016 19:07:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Aug 2016 19:07:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B743AC13FD for ; Sat, 27 Aug 2016 19:07:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.997 X-Spam-Level: * X-Spam-Status: No, score=1.997 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=cwareitservice.onmicrosoft.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Ki1SUQS8ytyX for ; Sat, 27 Aug 2016 19:07:37 +0000 (UTC) Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0104.outbound.protection.outlook.com [104.47.0.104]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 8ABDB5F251 for ; Sat, 27 Aug 2016 19:07:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CWareITService.onmicrosoft.com; s=selector1-cware-de0c; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=sArVPmZAeJycMLNVdiklBkLeY2p21XMEn4gNraaah6g=; b=DcgShUfzGhcWMu4O+U44umPKXbo5UcyPCrq2iZD4/ZjnwMyIBf6CkB9OqMh+XtayqojAqtO0TNmHA6POB9bJbOlC3qOSEzDNUaPz9vYnHTdlqKswUF9CmIDF3ewZnRCgbwKEUSg9FHr6AdfSDM1HU7rTbxidA+d2tMKDPXiawQw= Received: from HE1PR0501MB2428.eurprd05.prod.outlook.com (10.168.126.8) by HE1PR0501MB2427.eurprd05.prod.outlook.com (10.168.126.7) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384) id 15.1.587.13; Sat, 27 Aug 2016 19:07:25 +0000 Received: from HE1PR0501MB2428.eurprd05.prod.outlook.com ([10.168.126.8]) by HE1PR0501MB2428.eurprd05.prod.outlook.com ([10.168.126.8]) with mapi id 15.01.0587.013; Sat, 27 Aug 2016 19:07:26 +0000 From: Christofer Dutz To: "infrastructure-dev@apache.org" Subject: AW: Tool proposal for helping run and monitor the ASF Infra Services Thread-Topic: Tool proposal for helping run and monitor the ASF Infra Services Thread-Index: AQHSACsXT+Ufefpl3EKCXbECIzzPx6BdC8ejgAAenoCAAAFHmA== Date: Sat, 27 Aug 2016 19:07:25 +0000 Message-ID: References: <347EE69B-877E-40D9-A407-5F77C373952B@codecentric.de> <8E4C8591-491C-47FB-99EB-BF3F5C88CA35@apache.org> , In-Reply-To: Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=christofer.dutz@c-ware.de; x-originating-ip: [25.168.27.132] x-ms-office365-filtering-correlation-id: fc7e0527-b501-4574-1441-08d3cead6221 x-microsoft-exchange-diagnostics: 1;HE1PR0501MB2427;6:zUpBuCr4jGIOARz10snVkyn1mFMslScJdj7kFSEUatuiZbz95Y6lLWQOZiFtVWMncitqRpoXPTT74jxdecseCCyiP4s3U0fwxhV05Ghg7nrFUW2xHrsCRpxkIzw8WANW2bdDNoVrmHUbzotCzaBI+vE5CD7tQxbNY5RITGGNQAKdNgyfwdt+HhrKkCwTRQz0yclC86qEaLOBPikrsBncTD34cD4yXxQ9l+OFOsxUp/hVRy9qaq+baZkGjngnQ4qfFhPHnUD7I4beyb5XFrBZobXOUWIwZPeDFWveyBg7BwU4qf+PryC2YvQcxgOnpud0;5:3gDxMbDiH/XwJHr8SARi7DmgyS7S6UNPiicAS9QI20Qp5+a0WSUeTMgDuPfLr1aEJdMJTwcC6bN6qyKVbA2+v6cNL/lqXrAFJy7+KiBYV3NzQjxoG/JpV9AwO16bYiawXBcG8UJ05yM8MoKVGaOaXw==;24:GnEDeBz8nUCZlL1k8Cpzq39nBs8H+NlP0lWx2ppVOru+JyWRtyWYKwZdKyeSS2fFVVRwrgEKO2mu3iX6YyeAUHuONzETBTjW0ZzwFGht+co=;7:rXwyWHxstMXrZG6PCJVPKAwtgXfw7B/T0AC14yy9CzInBVbZ9pi2etx430fOy6+pPIg2G4yzXdR85e00C3naFY+SN3b9rtYb61kziM+PfHrNqvS95iG7n3AxV2eNZwy5hjkRPXrNSbPjqYQ9lZLUgX+gjGONQEGOw1kTzoGfeatd3femwniv1FmwdkU7gZgv5+OyumkVIasaKw9YYsOCLcECQUVEAZ/15/v8/JzEMKPBfad8mqllE1+OxOm0BTGv x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0501MB2427; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(20558992708506)(72170088055959)(209352067349851)(192374486261705)(265273979862326)(148717330147763); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6042046)(6043046);SRVR:HE1PR0501MB2427;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0501MB2427; x-forefront-prvs: 0047BC5ADE x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(7916002)(53754006)(189002)(76104003)(24454002)(51914003)(66654002)(377454003)(52044002)(52014003)(199003)(377424004)(5640700001)(19625305001)(93886004)(101416001)(19625215002)(561944003)(122556002)(5660300001)(16236675004)(33656002)(74482002)(75402003)(87936001)(97736004)(229853001)(76176999)(106356001)(107886002)(5002640100001)(106116001)(54356999)(2950100001)(551544002)(19580395003)(2351001)(105586002)(19617315012)(92566002)(189998001)(110136002)(2900100001)(50986999)(66066001)(9686002)(450100001)(575784001)(3660700001)(77096005)(8676002)(81166006)(86362001)(15974865002)(19580405001)(7906003)(2501003)(74316002)(586003)(76576001)(7696003)(15975445007)(3846002)(3280700002)(81156014)(10400500002)(31430400001)(7846002)(68736007)(102836003)(8936002)(7736002)(2906002)(6116002)(8864003)(290074003);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0501MB2427;H:HE1PR0501MB2428.eurprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; received-spf: None (protection.outlook.com: c-ware.de does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_HE1PR0501MB24281114C37AA4E88B2AD00BA2EF0HE1PR0501MB2428_" MIME-Version: 1.0 X-OriginatorOrg: c-ware.de X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Aug 2016 19:07:25.7948 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 9d387546-1437-4b89-846c-691d64a7e74d X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0501MB2427 archived-at: Sat, 27 Aug 2016 19:07:43 -0000 --_000_HE1PR0501MB24281114C37AA4E88B2AD00BA2EF0HE1PR0501MB2428_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Chris, that sounds like a good option. A test Jira should provide several services= that Instana can monitor. I'll be consumed by some paid work on Monday and Tuesday and I'll try to ge= t started on this on Wednesday. So I guess you can be expecting some questi= ons from me then ;-) Chris ________________________________ Von: Chris Lambertus Gesendet: Samstag, 27. August 2016 21:01:07 An: infrastructure-dev@apache.org Betreff: Re: Tool proposal for helping run and monitor the ASF Infra Servic= es I have a few test VMs - if you want to develop a puppet module for this in = the same style as we=92ve set up datadog, that would be excellent. I think = our Jira test instance would be a good candidate for this on the linux side= . Feel free to hit me up on hipchat or direct email to discuss particulars = about the puppet setup. Thanks, -Chris > On Aug 27, 2016, at 10:29 AM, Christofer Dutz = wrote: > > Hi Chris, > > > Ok so I started the Agent ... the Windows build agent is rather unexcitin= g as it only has a hand full of java processes and a MsSQL Server with all = network shut down. When encountering software, that is explicitly instrumen= ted such as an Apache Webserver, Tomcat, the introspection will provide mor= e insight. > > > Eventually I could whip up a puppet script which you could use to setup t= he Instana Agent on a linux machine. Then you should be able to setup an ag= ent on one of the more interesting machines. Are there any test VMs I could= test such a script on? > > > I just sent you (Chris L.) an invitation to the Instana Web Console, so f= eel free to sign up. If anyone else is interested, I'll be glad to invite h= im/her too :-) > > > The "Datacenter A / Rack 42" is just a dummy as I don't have any clue abo= ut what datacenter the machine/vm is running on. > > > Chris > > ________________________________ > Von: Chris Lambertus > Gesendet: Samstag, 27. August 2016 08:20:15 > An: infrastructure-dev@apache.org > Cc: Christofer Dutz > Betreff: Re: Tool proposal for helping run and monitor the ASF Infra Serv= ices > > Chris, > > We=92re interested, but we don=92t really have the bandwidth to set up ev= aluations for tools like this right now. If you truly believe it=92s worthw= hile, then go ahead and fire up the agent on hudson-win and let=92s see wha= t we get out of it. If it doesn=92t require any elevated permissions, I=92m= happy to see a live demo. > > > > > On Aug 23, 2016, at 11:46 PM, Christofer Dutz > wrote: > > Hi all, > > > How about trying it out for one chain of services? I bet that would expla= in a lot more than anything else? As I mentioned, the SAAS account is alrea= dy setup and I can create some user accounts for you guys. > > > On the windows1 machine (hudson-win.apache.org in directory "f:\instana-agent") I have the agent installed (but inact= ive) so you could have a look at the agent itself. > > > We could fire up the agent and start a Jenkins build and it will definite= ly show us communication with the jenkins machine, repository.apache.org and the proxy (Well probably with the proxy). = So if we setup the agent on windows, the proxy and repository.apache.org I guess this should give a first impression. A= nd if you don't like or want it in the end, just stop and delete the agent = and no harm was done. What do you think? > > > I would be glad to assist you in this. > > > Chris > > ________________________________ > Von: Mirko Novakovic > > Gesendet: Dienstag, 23. August 2016 16:04:52 > An: infrastructure-dev@apache.org > Betreff: Re: AW: Tool proposal for helping run and monitor the ASF Infra = Services > > Hi Daniel, > > I am the Instana CEO and happy to answer your questions. First of all I w= ould like to point out that we are an APM company/product, so more in the s= pace of NewRelic than DataDog. But as we are monitoring the application as = a whole (and model a dynamic dependency graph for it) we also monitor the i= nfrastructure. And have out of the box dashboards etc but the core is our d= istributed tracing and service quality management - all automatic and witho= ut configuration. > > Regarding the questions: > > - I can guarantee that there is no time limit on this offering for Apache= - the only thing we should discuss is the number of components you would m= onitor, as it is SAAS there should be some maximum defined so that the cost= s for us do not explode. If you want to have Instana On Premise we wouldn= =92t need that limit. > > - Our whole system is streaming based, so we stream the data form the age= nt to the server and from the server the data is pushed to the UI. On the b= ackend we build a graph and apply a knowledge base on each component to fin= d changes and issues in realtime. Based on service KPIs (Four Golden Signal= s by Google) we identify if the changes/issues are having impact on the ser= vice quality - if yes we report. The algorithms are based on machine learni= ng, so we see things like sudden drops, slow responses, high error rates, e= tc etc > > - Don=92t know Circonus in details, so I cannot really compare it. Instan= a for sure is not time consuming as it was designed for auto discovery and = zero configuration. > > - We have different types of integration using API and SDK. For alarms we= have out of the box integrations for PagerDuty, OpsGenie and Slack, but al= so provide a generic Webhook to integrated with any other system that has a= n API. > > - The agent is Apache Karaf based, can update itself using a private or p= ublic sensor (plugin) repository that is based on Apache Maven technology. = We provide a set of sensors for approx 40 technologies right now but enable= the users to extend this with own sensors. > > We can give you more insights or a demo at any time. > > Best regards > Mirko > > > On 2016-08-19 09:16 (+0200), Daniel Gruno > wrote: > On 08/19/2016 08:20 AM, Christofer Dutz wrote:> > Hi Chris,> > > I knew that someone asked exactly the "how does it compare to datadog" qu= estion somewhere. Here's the link to that mail thread https://news.ycombina= tor.com/item?id=3D12147219> > > And I can confirm the shortcomings of the time series approach, cause in = jenkins, I'd say about 70% of recent failures of flex builds were due to ti= meouts when uploading Maven artifacts to nexus. The current solution doesn'= t seem to detect that. Not only that I couldn't see any hipchat notificatio= ns. The infra guys always had to start looking for the real reason of the t= imeouts as nexus wasn't having any problems at all.> > > I'd say the real reason we weren't being notified is because httpd> > wasn't telling us it was jammed, so it would have made little difference> > whether we used product X or Y to monitor it, when it wasn't sending out> > data that could lead us to what the problem was. It wasn't a question of> > granularity, the data just wasn't enabled for any agent to see.> > > I would imagine this would be something _instead_ of datadog, which> > leads me to some questions:> > > - For how long into the future could we have a guarantee that this isn't> > gonna cost us $$$/year or whatever the price would end up being with a> > non-free version?> > > - What is understood by real time checks here? and what exactly is checke= d?> > > - How does this relate to more advanced monitoring systems like> > Circonus? As you may know, we had to drop that as it proved to be rather> > time consuming.> > > - What sort of integrations does this system have? How are alerts> > dispatched?> > > I'd also be interested in learning about plugins and how to customize> > the agents.> > > With regards,> > Daniel.> > > > And I really live the feature of tracking down the response time for one = service back to other servers to find it the real reason for a system being= slow (have a look in the presentation for this. There's a great slide on t= his)> > > Chris> > > > Von meinem Samsung Galaxy Smartphone gesendet.> > > > -------- Ursprngliche Nachricht --------> > Von: Chris Lambertus >> > Datum: 19.08.16 07:03 (GMT+01:00)> > An: infrastructure-dev@apache.org, = Christofer Dutz >> > Cc: mirko.novakovic@codecentric.de= > > Betreff: Re: Tool proposal for helping run and monitor the ASF Infra Serv= ices> > > > > Hiya Chris,> > > Thanks for the info and the legwork on this. We currently use DataDog, wh= ich is very similar to what Instana appears to provide an agent-based monit= oring solution that gives us that kind of look into our infra. We also have= a number of internal tools that report on various goings-on as well. You m= ight see some of this in #asfinfra on hipchat from SNMP2HipChat. DataDog al= so reports various problems there, as does our monitoring via PingMyBox.> > > Since youre not root@, you may not see some of the stuff that we see, but= I think by and large, the majority of the monitors do direct to #asfinfra.= Have you noticed gaps in the monitoring? Since we moved to DataDog, weve b= een quite happy with the resolution and metrics weve been able to get. Its = been on my back burner for awhile to expose some of our DD dashboards as pu= blic, but for right now its somewhat limited access. In the interests of tr= ansparency (but not at the expense of security,) Id be happy to work with y= ou to expose more of this, and Im happy to address any questions or concern= s about shortcomings in our monitoring.> > > Many thanks to Instana for offering the ASF free services! Id definitely = like to hear more about what they might be able to offer on top of what we = already get from DataDog. Ill take a look at the info you sent out. Please = feel free to follow up with me directly, either via email or hipchat.> > > Cheers,> > -Chris> > > > > > On Aug 18, 2016, at 2:13 AM, Christofer Dutz > wrote:> > > Hi,> > > > > I have been on the Infra Hipchat for a few weeks now while trying to migr= ate the Flex project to Maven and back to the ASF Infra build system. Thank= s for your support in this and even more thanks for the trust in granting m= e access and Admin rights on the windows1 build agent.> > > > > In the chat I observed the daily work of you guys, having to maintain qui= te a zoo of all sorts of different systems on different platforms. Some pro= blems you were having seem quite easy to track down ... if the hard disk is= full, you clean up. But not all problems are that easy to track down. Thin= king of the problems with repository.apache.org ... here the cause was the proxy being flooded with connections (I think= this was the case) ... regular restarts of this helped temporarily, but I = don't think that helps on the long term as no one had an idea why those con= nections were hanging there in the first place.> > > > > A few years ago the company I work for - codecentric - have founded a com= pany called Instana. They are developing an agent based system for monitori= ng IT infrastructure. In contrast to most established solutions, they use m= achine learning strategies to analyze the root cause for problems. While yo= u can probably achieve similar results with normal tools, the problem is th= at you need a very detailed domain knowledge to do so and in a regularly ch= anging environment you need to continuously keep adjusting your metrics. In= stana does this automatically. I think you can imagine how tricky it is to = follow the root cause for bad response times through a network of interconn= ected services.> > > > > Investing almost all of my free time (and a lot of my paid time) for Apac= he, noticing a lot of the problems you have to deal with every day, I asked= Instana if they would be willing to provide their service to the ASF for f= ree and they agreed and immediately setup a dedicated instance.> > > > > I wanted to try the thing out as I would prefer to grab a few beers with = you at ApcheCon in Cevillia and not get punched in the face for recommendin= g something bad ;-) ... so I tried this on my private Server playground. I = unpacked and started the agent and the host appeared on the web console and= reported the problems it was having (ones I didn't even know about) as wel= l as other systems it communicates with ... as soon as I added agents on th= ese machines the analytics started doing their work across system and I bui= lt up a map view of my services and their correlation. So it's really a sys= tem that needs almost no configuration at all :-)> > > > I uploaded the internal product presentation here: https://public.centerd= evice.de/1a9dc4ed-515e-482e-9fd6-6d60a5562598 (please don't share this outs= ide of the ASF)> > > Please use the password: 4p4cheR0cks (I'll remove that document in about = two weeks)> > > > By the way ... the screenshots in the presentation are real ... I was ama= zed of seeing a 3D web UI in production for the first time ;-)> > > > > So if there is any interest in this offer, I would be more than happy to = provide credentials to you and assist you in getting started, so you could = easily try it out. The guys at Instana would also be delighted to give you = guys an online demo and answer any questions you might be having. Feel free= to conatact Mirco directly for this: mirko.novakovic@codecentric.de> > > > > Chris> > > > > > > Mirko Novakovic | Vorstand > > codecentric AG | Merscheider Stra=DFe 1 | 42699 Solingen | Deutschland > tel: +49 (0) 212.23362811 | fax: +49 (0) 212.23362879 | mobil: +49 (0) 16= 3.6681500 > www.codecentric.de> | blog.codecentric.de | www.meettheexperts.de>> | www.more4fi.de> > > Sitz der Gesellschaft: Solingen | HRB 25917 | Amtsgericht Wuppertal > Vorstand: Michael Hochg=FCrtel . Mirko Novakovic . Rainer Vehns > Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus J=E4ger . J=FCrgen = Sch=FCtz > > Diese E-Mail einschlie=DFlich evtl. beigef=FCgter Dateien enth=E4lt vertr= auliche und/oder rechtlich gesch=FCtzte Informationen. Wenn Sie nicht der r= ichtige Adressat sind oder diese E-Mail irrt=FCmlich erhalten haben, inform= ieren Sie bitte sofort den Absender und l=F6schen Sie diese E-Mail und evtl= . beigef=FCgter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder =D6f= fnen evtl. beigef=FCgter Dateien sowie die unbefugte Weitergabe dieser E-Ma= il ist nicht gestattet. > --_000_HE1PR0501MB24281114C37AA4E88B2AD00BA2EF0HE1PR0501MB2428_--