From dev-return-497-archive-asf-public=cust-asf.ponee.io@zipkin.apache.org Tue Mar 19 11:29:18 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6583D180626 for ; Tue, 19 Mar 2019 12:29:17 +0100 (CET) Received: (qmail 88585 invoked by uid 500); 19 Mar 2019 11:29:16 -0000 Mailing-List: contact dev-help@zipkin.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zipkin.apache.org Delivered-To: mailing list dev@zipkin.apache.org Received: (qmail 88573 invoked by uid 99); 19 Mar 2019 11:29:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2019 11:29:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7C306C22F8 for ; Tue, 19 Mar 2019 11:29:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id OKF4Z4pY545c for ; Tue, 19 Mar 2019 11:29:13 +0000 (UTC) Received: from mail-oi1-f194.google.com (mail-oi1-f194.google.com [209.85.167.194]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 3DBA95F119 for ; Tue, 19 Mar 2019 11:29:13 +0000 (UTC) Received: by mail-oi1-f194.google.com with SMTP id w137so3646940oiw.5 for ; Tue, 19 Mar 2019 04:29:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=AHYoBTDe4YmK4SqiTVZsMmIKUgYSLnrtlUEy9YzPuoc=; b=p0l9M2aO2B9+kum3LN0HSMRsBWJi6HDxjxGyCUsFEscnw3zxJo7X0mYk6CFRtpQKCJ Ux9wk5x+YI1gJDUPTR8368wADWtHQfsN3AgSjjwZWvZfmmGbU2QfmuRHohlfnvh7fpJw ofReP/y2c5LoCMD8unR2187HsS/5tUQmIIgTNM6V7ymHBsNJiCcrvBoBxosu+xol2u4O MqDOzILviHsPTyoF8zdwhhP0C/Y8DNAyDWOrkPyKX6bRLcntLsWIUKkK6qk379DNLbZv eWqV19wPNxhaH9PTelyv6w+gPSycjgbjalP0rV971q5UetqD1HxFyJfJXqhO4p+KX4MX 6LHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=AHYoBTDe4YmK4SqiTVZsMmIKUgYSLnrtlUEy9YzPuoc=; b=tU1nDnyTquGPQup491aoTl8kBdhyS5QOlwD9UyKsRlh0S2EQBt1fQuajBrN72eEqFp wZhWowPKohcz07XaTJi3iugAhzzW8AvtWTnjrMKjMJbBzNDtL+sxGU1RvNiZlRz/0ycw KJ0iTjM4O5XoULX1OOIix+ibacP/3PZ/tlmqR4S0RobE7m8SvjuyD2rkjqL1wMkYzMzJ B93LUpjaLJNNqQqvhJQ+6fNCAz9lnfOKiloVdLqC6KGG8b/lidvDG0EcdNtodxoffh4D 0ClxKWU3wGhZeCZPAxmwWH+z4vNZIh97DOzccJEBYldOi9Cipjs2ZfNvxA0WI8NwFmA/ tuLA== X-Gm-Message-State: APjAAAXEQrh1IkpHCvIRIqmMMGgYlpt0Yapy3Os07If6NsdNwmM5HdFi Q3ONqFbyfwshJnPuk9e8s98nlcP+vwucHOuIU/1MUQ== X-Google-Smtp-Source: APXvYqwNzvZZRveIz2vpD39F/kh4/8x1exeDwxqYkIBUsz/BPZLxINrHR1bOZmRh4moKYFCo87h/Xy0VwnNddznRL/Q= X-Received: by 2002:aca:f004:: with SMTP id o4mr1018160oih.55.1552994951981; Tue, 19 Mar 2019 04:29:11 -0700 (PDT) MIME-Version: 1.0 References: <62311575.20190319072019@gmail.com> In-Reply-To: <62311575.20190319072019@gmail.com> From: Adrian Cole Date: Tue, 19 Mar 2019 19:29:30 +0800 Message-ID: Subject: Re: Perpetual support problems using Spark for dependency link aggregation To: Andriy Redko Cc: dev@zipkin.apache.org Content-Type: multipart/alternative; boundary="000000000000736fba058470d009" --000000000000736fba058470d009 Content-Type: text/plain; charset="UTF-8" Hi, Andriy Thanks for responding. I dont think we can assume there will always be a choice for streaming or online aggregation. The two easiest ways out would be a spark guru (ideally gurus) steeping forward or an easier to support alternative for after the fact aggregation over large datasets that minimally works with mysql ES and Cassandra. -A On Tue, Mar 19, 2019, 7:20 PM Andriy Redko wrote: > Hi Adrian, > > First of all, I want to confirm from the personal experiences, the > dependencies > are often built after the fact, so there is a real need for this kind of > job/component. > There are many choices, either to use the data processing engines you > mentioned, > or onboard the data store with aggregation capabalities (may ClickHouse > fe). What > do you think would be the best route for Zipkin? Keep the Spark but look > for > maintenance help? Or (re)write it altogether, ideally with no data engines > needed? Just trying to understand how you envision it. > > Best Regards, > Andriy Redko > > AC> Hi, team. > > AC> A long time ago, we arbitrarily used spark for dependency link > AC> aggregation (porting the work from Eirik's hadoop job). The initial > AC> spark job was created incomplete then abandoned by the author. I've > AC> tried a lot to support it, but it has been perpetual maintenance and > AC> most of us have no idea how to support it. Yet, we get a lot of user > AC> questions about it and the support load is higher than most of our > AC> projects. > > AC> The Elasticsearch part is landmines from the "wan only" stuff, to them > AC> having a narrow supported range of versions. It is rev-locked to a JRE > AC> (even if will change later). We've had users complain about CVE > AC> maintenance and actively ask for a non-spark option. General support > AC> comes in questions about cluster distribution which no-one knows the > AC> answer to. I've recently in desperation added a change to help show > AC> where Spark support is. > > AC> https://github.com/openzipkin/zipkin-dependencies/pull/133 > > AC> All this said, despite the problems running distributed or with > AC> elasticsearch, most can start the zipkin-dependencies job as a > AC> one-shot cron job without much help. > > AC> I think we have to be honest about the fact that since this project > AC> started, we've rarely had anyone able to support it. I hope we can get > AC> out of the mutually disappointing support swamp. Does anyone have any > AC> ideas? > > AC> I would like to think someone could come in and save us, but seems we > AC> should also consider other tools as that usually doesn't happen, and > AC> one person saving us isn't sustainable (usually we need a few people > AC> to know a tool in order to realistically support it). It is possible > AC> to recruit for this, but we need significant close buy-in from people > AC> who know spark imho, like actually helping with support, if we want to > AC> continue this path. > > AC> I know there's a Kafka streaming option [1]. I also know some have > AC> used Flink, and some have had interest in Pulsar. I think we should > AC> have streaming options, but fact is many don't use any buffer like > AC> Kafka (direct http), which leads me to think we still need an > AC> after-the-fact option (pull from storage). Moreover spark's embedded > AC> mode is nice as it can be treated as a dumb cron job. > > AC> Looking for ideas, > AC> -A > > AC> [1] https://github.com/sysco-middleware/zipkin-dependencies-streaming > > AC> --------------------------------------------------------------------- > AC> To unsubscribe, e-mail: dev-unsubscribe@zipkin.apache.org > AC> For additional commands, e-mail: dev-help@zipkin.apache.org > > > --000000000000736fba058470d009--