Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 639F5200B6F for ; Wed, 24 Aug 2016 08:46:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 620EA160AB1; Wed, 24 Aug 2016 06:46:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 33697160AA4 for ; Wed, 24 Aug 2016 08:46:55 +0200 (CEST) Received: (qmail 90712 invoked by uid 500); 24 Aug 2016 06:46:54 -0000 Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: infrastructure-dev@apache.org Delivered-To: mailing list infrastructure-dev@apache.org Received: (qmail 90693 invoked by uid 99); 24 Aug 2016 06:46:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2016 06:46:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EE2EDC1A4A for ; Wed, 24 Aug 2016 06:46:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.997 X-Spam-Level: * X-Spam-Status: No, score=1.997 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=cwareitservice.onmicrosoft.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id P5ZcMiRgLQOh for ; Wed, 24 Aug 2016 06:46:49 +0000 (UTC) Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0117.outbound.protection.outlook.com [104.47.0.117]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 884015F474 for ; Wed, 24 Aug 2016 06:46:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CWareITService.onmicrosoft.com; s=selector1-cware-de0c; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=zpr+2eoSVkl97yN2SGMaa4tj1Gpvc1RRs4OEummi/z4=; b=XbaAwI6VRX+7YCI6b74nXyK5RNYMonJ8/zCjo/YeXuF35HWEctYrdjWx9s6bMVpVR5Toi2APm2EITph1rz9rSXwWUPhGVGBYXJ437Slq4udEtjgHE1iOnhiXq0M6fnG52GdtFp1JTvekXi1u8gI8IuaZSigjgt2I1WH2pQ9e9Bw= Received: from HE1PR0501MB2428.eurprd05.prod.outlook.com (10.168.126.8) by HE1PR0501MB2428.eurprd05.prod.outlook.com (10.168.126.8) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384) id 15.1.587.13; Wed, 24 Aug 2016 06:46:38 +0000 Received: from HE1PR0501MB2428.eurprd05.prod.outlook.com ([10.168.126.8]) by HE1PR0501MB2428.eurprd05.prod.outlook.com ([10.168.126.8]) with mapi id 15.01.0587.013; Wed, 24 Aug 2016 06:46:38 +0000 From: Christofer Dutz To: "infrastructure-dev@apache.org" Subject: AW: AW: Tool proposal for helping run and monitor the ASF Infra Services Thread-Topic: AW: Tool proposal for helping run and monitor the ASF Infra Services Thread-Index: AQHR/U0POEOBasfGTU6WHyLRJ9pleaBXqWt8 Date: Wed, 24 Aug 2016 06:46:38 +0000 Message-ID: References: <347EE69B-877E-40D9-A407-5F77C373952B@codecentric.de> In-Reply-To: <347EE69B-877E-40D9-A407-5F77C373952B@codecentric.de> Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=christofer.dutz@c-ware.de; x-originating-ip: [25.168.27.132] x-ms-office365-filtering-correlation-id: a35fdc19-26f0-46f6-9587-08d3cbea661a x-microsoft-exchange-diagnostics: 1;HE1PR0501MB2428;6:h7MDUp2YmICitbntyqm2BHJ7W+npOzGhlomao/hNvW1OlppQGA3kHTueskZfFK+GEMJ7cm82D9q4BLjy+HTiwmi70iVcHjocQJEzG6hFYP8rTKW0T7quehLPm/PyfbZt6W03ri5sVweaN+fAWJutIbV5MQ0nzWL36cXSrlE3mwCOUrrix8bi4k3CGjNEEEArQy0p9z518LZvaN+oKIqxMovsQ1OZojjQ5fhSeNDDus6Y5uSUupyB6r1UQ+XXCp8iwy5Jc7jerWcVT9S1m/eDnBzaZ8yBnML1go3XMN5F4LwNuleXjQWW0nmzSANOX5UD;5:eIVTHUrJGmSOiC1zf/w3LX3gh60pbpiOFklkL+0gOtb7HK6yBDWioR7YmbNNJ+FCSAUF0yOKiWA3+EGMZNrlXGWdrEtOQstVN1km/r5j8DyftgUWpL+5Jx7bXnlZbc1NYYB1s0BHqjSiCw8ZCx0hKA==;24:WSFierDJN02Rlu2fEQWMuzpSDZtPJbEiYmZZ7z/z0VU2ET3pfAeQDggFw0uEVzj7uCjyYQxOo6ToHbYhlceEg0u7Bk4/SglLIJb9StQuZ2A=;7:4urjkq29IDJlzpuIftfuEeKf3LyTjm8sKU7RDcSEtI2HQM+pH40xtl6I21F1g47nHGY+d11XQVbkDWeINkKZf/Ew6M7MsejY6v7bs7mE/xnYW4Mxk6mVqD8eSgdZlth9bz8DphbaKP//ENqHWern00mdprN2e9lvgq+8UD8dUYUNRcjyLDPTMwUVbs8vd6ORja0HLkMcyUlcyj5S4JkXNbwe/19oM/JvbEIKaAdbeoTo6Il+a/hPHkdgkzL2N0/6 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0501MB2428; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(20558992708506)(72170088055959)(209352067349851)(192374486261705)(148717330147763); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6043046)(6042046);SRVR:HE1PR0501MB2428;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0501MB2428; x-forefront-prvs: 0044C17179 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(7916002)(377424004)(24454002)(189002)(52044002)(51914003)(199003)(377454003)(52014003)(76104003)(53754006)(66654002)(5660300001)(15974865002)(101416001)(92566002)(561944003)(75402003)(74482002)(551544002)(11100500001)(450100001)(19625215002)(33656002)(19617315012)(110136002)(107886002)(106116001)(10400500002)(76176999)(189998001)(54356999)(2501003)(97736004)(86362001)(87936001)(122556002)(106356001)(50986999)(575784001)(16236675004)(5002640100001)(7696003)(7906003)(31430400001)(7846002)(2351001)(5640700001)(7736002)(229853001)(74316002)(3846002)(3280700002)(6116002)(9686002)(8936002)(102836003)(81156014)(586003)(8676002)(2900100001)(68736007)(2950100001)(81166006)(15975445007)(76576001)(105586002)(1680700002)(66066001)(77096005)(19580405001)(2906002)(3660700001)(19580395003)(16601075003)(290074003);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0501MB2428;H:HE1PR0501MB2428.eurprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; received-spf: None (protection.outlook.com: c-ware.de does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_HE1PR0501MB242837434E73B4A7D5F3923CA2EA0HE1PR0501MB2428_" MIME-Version: 1.0 X-OriginatorOrg: c-ware.de X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Aug 2016 06:46:38.1512 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 9d387546-1437-4b89-846c-691d64a7e74d X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0501MB2428 archived-at: Wed, 24 Aug 2016 06:46:56 -0000 --_000_HE1PR0501MB242837434E73B4A7D5F3923CA2EA0HE1PR0501MB2428_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi all, How about trying it out for one chain of services? I bet that would explain= a lot more than anything else? As I mentioned, the SAAS account is already= setup and I can create some user accounts for you guys. On the windows1 machine (hudson-win.apache.org in directory "f:\instana-age= nt") I have the agent installed (but inactive) so you could have a look at = the agent itself. We could fire up the agent and start a Jenkins build and it will definitely= show us communication with the jenkins machine, repository.apache.org and = the proxy (Well probably with the proxy). So if we setup the agent on windo= ws, the proxy and repository.apache.org I guess this should give a first im= pression. And if you don't like or want it in the end, just stop and delete= the agent and no harm was done. What do you think? I would be glad to assist you in this. Chris ________________________________ Von: Mirko Novakovic Gesendet: Dienstag, 23. August 2016 16:04:52 An: infrastructure-dev@apache.org Betreff: Re: AW: Tool proposal for helping run and monitor the ASF Infra Se= rvices Hi Daniel, I am the Instana CEO and happy to answer your questions. First of all I wou= ld like to point out that we are an APM company/product, so more in the spa= ce of NewRelic than DataDog. But as we are monitoring the application as a = whole (and model a dynamic dependency graph for it) we also monitor the inf= rastructure. And have out of the box dashboards etc but the core is our dis= tributed tracing and service quality management - all automatic and without= configuration. Regarding the questions: - I can guarantee that there is no time limit on this offering for Apache -= the only thing we should discuss is the number of components you would mon= itor, as it is SAAS there should be some maximum defined so that the costs = for us do not explode. If you want to have Instana On Premise we wouldn=92t= need that limit. - Our whole system is streaming based, so we stream the data form the agent= to the server and from the server the data is pushed to the UI. On the bac= kend we build a graph and apply a knowledge base on each component to find = changes and issues in realtime. Based on service KPIs (Four Golden Signals = by Google) we identify if the changes/issues are having impact on the servi= ce quality - if yes we report. The algorithms are based on machine learning= , so we see things like sudden drops, slow responses, high error rates, etc= etc - Don=92t know Circonus in details, so I cannot really compare it. Instana = for sure is not time consuming as it was designed for auto discovery and ze= ro configuration. - We have different types of integration using API and SDK. For alarms we h= ave out of the box integrations for PagerDuty, OpsGenie and Slack, but also= provide a generic Webhook to integrated with any other system that has an = API. - The agent is Apache Karaf based, can update itself using a private or pub= lic sensor (plugin) repository that is based on Apache Maven technology. We= provide a set of sensors for approx 40 technologies right now but enable t= he users to extend this with own sensors. We can give you more insights or a demo at any time. Best regards Mirko On 2016-08-19 09:16 (+0200), Daniel Gruno wrote: > On 08/19/2016 08:20 AM, Christofer Dutz wrote:> > > Hi Chris,> > > > > > I knew that someone asked exactly the "how does it compare to datadog" = question somewhere. Here's the link to that mail thread https://news.ycombi= nator.com/item?id=3D12147219> > > > > > And I can confirm the shortcomings of the time series approach, cause i= n jenkins, I'd say about 70% of recent failures of flex builds were due to = timeouts when uploading Maven artifacts to nexus. The current solution does= n't seem to detect that. Not only that I couldn't see any hipchat notificat= ions. The infra guys always had to start looking for the real reason of the= timeouts as nexus wasn't having any problems at all.> > > I'd say the real reason we weren't being notified is because httpd> > wasn't telling us it was jammed, so it would have made little difference> > whether we used product X or Y to monitor it, when it wasn't sending out> > data that could lead us to what the problem was. It wasn't a question of> > granularity, the data just wasn't enabled for any agent to see.> > > I would imagine this would be something _instead_ of datadog, which> > leads me to some questions:> > > - For how long into the future could we have a guarantee that this isn't> > gonna cost us $$$/year or whatever the price would end up being with a> > non-free version?> > > - What is understood by real time checks here? and what exactly is checke= d?> > > - How does this relate to more advanced monitoring systems like> > Circonus? As you may know, we had to drop that as it proved to be rather> > time consuming.> > > - What sort of integrations does this system have? How are alerts> > dispatched?> > > I'd also be interested in learning about plugins and how to customize> > the agents.> > > With regards,> > Daniel.> > > > > > > And I really live the feature of tracking down the response time for on= e service back to other servers to find it the real reason for a system bei= ng slow (have a look in the presentation for this. There's a great slide on= this)> > > > > > Chris> > > > > > > > > Von meinem Samsung Galaxy Smartphone gesendet.> > > > > > > > > -------- Ursprngliche Nachricht --------> > > Von: Chris Lambertus > > > Datum: 19.08.16 07:03 (GMT+01:00)> > > An: infrastructure-dev@apache.org, Christofer Dutz > > > Cc: mirko.novakovic@codecentric.de> > > Betreff: Re: Tool proposal for helping run and monitor the ASF Infra Se= rvices> > > > > > > > > > > > Hiya Chris,> > > > > > Thanks for the info and the legwork on this. We currently use DataDog, = which is very similar to what Instana appears to provide an agent-based mo= nitoring solution that gives us that kind of look into our infra. We also h= ave a number of internal tools that report on various goings-on as well. Yo= u might see some of this in #asfinfra on hipchat from SNMP2HipChat. DataDog= also reports various problems there, as does our monitoring via PingMyBox.= > > > > > > Since youre not root@, you may not see some of the stuff that we see, b= ut I think by and large, the majority of the monitors do direct to #asfinfr= a. Have you noticed gaps in the monitoring? Since we moved to DataDog, weve= been quite happy with the resolution and metrics weve been able to get. It= s been on my back burner for awhile to expose some of our DD dashboards as = public, but for right now its somewhat limited access. In the interests of = transparency (but not at the expense of security,) Id be happy to work with= you to expose more of this, and Im happy to address any questions or conce= rns about shortcomings in our monitoring.> > > > > > Many thanks to Instana for offering the ASF free services! Id definitel= y like to hear more about what they might be able to offer on top of what w= e already get from DataDog. Ill take a look at the info you sent out. Pleas= e feel free to follow up with me directly, either via email or hipchat.> > > > > > Cheers,> > > -Chris> > > > > > > > > > > > > > >> On Aug 18, 2016, at 2:13 AM, Christofer Dutz wrote:> > >>> > >> Hi,> > >>> > >>> > >>> > >> I have been on the Infra Hipchat for a few weeks now while trying to m= igrate the Flex project to Maven and back to the ASF Infra build system. Th= anks for your support in this and even more thanks for the trust in grantin= g me access and Admin rights on the windows1 build agent.> > >>> > >>> > >>> > >> In the chat I observed the daily work of you guys, having to maintain = quite a zoo of all sorts of different systems on different platforms. Some = problems you were having seem quite easy to track down ... if the hard disk= is full, you clean up. But not all problems are that easy to track down. T= hinking of the problems with repository.apache.org ... here the cause was t= he proxy being flooded with connections (I think this was the case) ... reg= ular restarts of this helped temporarily, but I don't think that helps on t= he long term as no one had an idea why those connections were hanging there= in the first place.> > >>> > >>> > >>> > >> A few years ago the company I work for - codecentric - have founded a = company called Instana. They are developing an agent based system for monit= oring IT infrastructure. In contrast to most established solutions, they us= e machine learning strategies to analyze the root cause for problems. While= you can probably achieve similar results with normal tools, the problem is= that you need a very detailed domain knowledge to do so and in a regularly= changing environment you need to continuously keep adjusting your metrics.= Instana does this automatically. I think you can imagine how tricky it is = to follow the root cause for bad response times through a network of interc= onnected services.> > >>> > >>> > >>> > >> Investing almost all of my free time (and a lot of my paid time) for A= pache, noticing a lot of the problems you have to deal with every day, I as= ked Instana if they would be willing to provide their service to the ASF fo= r free and they agreed and immediately setup a dedicated instance.> > >>> > >>> > >>> > >> I wanted to try the thing out as I would prefer to grab a few beers wi= th you at ApcheCon in Cevillia and not get punched in the face for recommen= ding something bad ;-) ... so I tried this on my private Server playground.= I unpacked and started the agent and the host appeared on the web console = and reported the problems it was having (ones I didn't even know about) as = well as other systems it communicates with ... as soon as I added agents on= these machines the analytics started doing their work across system and I = built up a map view of my services and their correlation. So it's really a = system that needs almost no configuration at all :-)> > >>> > >>> > >> I uploaded the internal product presentation here: https://public.cent= erdevice.de/1a9dc4ed-515e-482e-9fd6-6d60a5562598 (please don't share this o= utside of the ASF)> > >>> > >> Please use the password: 4p4cheR0cks (I'll remove that document in abo= ut two weeks)> > >>> > >>> > >> By the way ... the screenshots in the presentation are real ... I was = amazed of seeing a 3D web UI in production for the first time ;-)> > >>> > >>> > >>> > >> So if there is any interest in this offer, I would be more than happy = to provide credentials to you and assist you in getting started, so you cou= ld easily try it out. The guys at Instana would also be delighted to give y= ou guys an online demo and answer any questions you might be having. Feel f= ree to conatact Mirco directly for this: mirko.novakovic@codecentric.de> > >>> > >>> > >>> > >> Chris> > > > > > > > > Mirko Novakovic | Vorstand codecentric AG | Merscheider Stra=DFe 1 | 42699 Solingen | Deutschland tel: +49 (0) 212.23362811 | fax: +49 (0) 212.23362879 | mobil: +49 (0) 163.= 6681500 www.codecentric.de | blog.codecentric.de | www.m= eettheexperts.de | www.more4fi.de Sitz der Gesellschaft: Solingen | HRB 25917 | Amtsgericht Wuppertal Vorstand: Michael Hochg=FCrtel . Mirko Novakovic . Rainer Vehns Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus J=E4ger . J=FCrgen Sc= h=FCtz Diese E-Mail einschlie=DFlich evtl. beigef=FCgter Dateien enth=E4lt vertrau= liche und/oder rechtlich gesch=FCtzte Informationen. Wenn Sie nicht der ric= htige Adressat sind oder diese E-Mail irrt=FCmlich erhalten haben, informie= ren Sie bitte sofort den Absender und l=F6schen Sie diese E-Mail und evtl. = beigef=FCgter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder =D6ffn= en evtl. beigef=FCgter Dateien sowie die unbefugte Weitergabe dieser E-Mail= ist nicht gestattet. --_000_HE1PR0501MB242837434E73B4A7D5F3923CA2EA0HE1PR0501MB2428_--