From dev-return-497-archive-asf-public=cust-asf.ponee.io@zipkin.apache.org  Tue Mar 19 11:29:18 2019
Return-Path: <dev-return-497-archive-asf-public=cust-asf.ponee.io@zipkin.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 6583D180626
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 19 Mar 2019 12:29:17 +0100 (CET)
Received: (qmail 88585 invoked by uid 500); 19 Mar 2019 11:29:16 -0000
Mailing-List: contact dev-help@zipkin.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@zipkin.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@zipkin.apache.org>
List-Post: <mailto:dev@zipkin.apache.org>
List-Id: <dev.zipkin.apache.org>
Reply-To: dev@zipkin.apache.org
Delivered-To: mailing list dev@zipkin.apache.org
Received: (qmail 88573 invoked by uid 99); 19 Mar 2019 11:29:15 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2019 11:29:15 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7C306C22F8
	for <dev@zipkin.apache.org>; Tue, 19 Mar 2019 11:29:15 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.798
X-Spam-Level: *
X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31
	tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id OKF4Z4pY545c for <dev@zipkin.apache.org>;
	Tue, 19 Mar 2019 11:29:13 +0000 (UTC)
Received: from mail-oi1-f194.google.com (mail-oi1-f194.google.com [209.85.167.194])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 3DBA95F119
	for <dev@zipkin.apache.org>; Tue, 19 Mar 2019 11:29:13 +0000 (UTC)
Received: by mail-oi1-f194.google.com with SMTP id w137so3646940oiw.5
        for <dev@zipkin.apache.org>; Tue, 19 Mar 2019 04:29:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=AHYoBTDe4YmK4SqiTVZsMmIKUgYSLnrtlUEy9YzPuoc=;
        b=p0l9M2aO2B9+kum3LN0HSMRsBWJi6HDxjxGyCUsFEscnw3zxJo7X0mYk6CFRtpQKCJ
         Ux9wk5x+YI1gJDUPTR8368wADWtHQfsN3AgSjjwZWvZfmmGbU2QfmuRHohlfnvh7fpJw
         ofReP/y2c5LoCMD8unR2187HsS/5tUQmIIgTNM6V7ymHBsNJiCcrvBoBxosu+xol2u4O
         MqDOzILviHsPTyoF8zdwhhP0C/Y8DNAyDWOrkPyKX6bRLcntLsWIUKkK6qk379DNLbZv
         eWqV19wPNxhaH9PTelyv6w+gPSycjgbjalP0rV971q5UetqD1HxFyJfJXqhO4p+KX4MX
         6LHQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=AHYoBTDe4YmK4SqiTVZsMmIKUgYSLnrtlUEy9YzPuoc=;
        b=tU1nDnyTquGPQup491aoTl8kBdhyS5QOlwD9UyKsRlh0S2EQBt1fQuajBrN72eEqFp
         wZhWowPKohcz07XaTJi3iugAhzzW8AvtWTnjrMKjMJbBzNDtL+sxGU1RvNiZlRz/0ycw
         KJ0iTjM4O5XoULX1OOIix+ibacP/3PZ/tlmqR4S0RobE7m8SvjuyD2rkjqL1wMkYzMzJ
         B93LUpjaLJNNqQqvhJQ+6fNCAz9lnfOKiloVdLqC6KGG8b/lidvDG0EcdNtodxoffh4D
         0ClxKWU3wGhZeCZPAxmwWH+z4vNZIh97DOzccJEBYldOi9Cipjs2ZfNvxA0WI8NwFmA/
         tuLA==
X-Gm-Message-State: APjAAAXEQrh1IkpHCvIRIqmMMGgYlpt0Yapy3Os07If6NsdNwmM5HdFi
	Q3ONqFbyfwshJnPuk9e8s98nlcP+vwucHOuIU/1MUQ==
X-Google-Smtp-Source: APXvYqwNzvZZRveIz2vpD39F/kh4/8x1exeDwxqYkIBUsz/BPZLxINrHR1bOZmRh4moKYFCo87h/Xy0VwnNddznRL/Q=
X-Received: by 2002:aca:f004:: with SMTP id o4mr1018160oih.55.1552994951981;
 Tue, 19 Mar 2019 04:29:11 -0700 (PDT)
MIME-Version: 1.0
References: <CAHzwyDsTx4FJ2fQ14HDV1Fbk3SFZHHj_akkFNe=V3vtF+Cizag@mail.gmail.com>
 <62311575.20190319072019@gmail.com>
In-Reply-To: <62311575.20190319072019@gmail.com>
From: Adrian Cole <adrian.f.cole@gmail.com>
Date: Tue, 19 Mar 2019 19:29:30 +0800
Message-ID: <CAHzwyDt=R+Oo3thc-0p3sU0PbONwpbsFZKytN2XMA+iHqy3ppQ@mail.gmail.com>
Subject: Re: Perpetual support problems using Spark for dependency link aggregation
To: Andriy Redko <drreta@gmail.com>
Cc: dev@zipkin.apache.org
Content-Type: multipart/alternative; boundary="000000000000736fba058470d009"

--000000000000736fba058470d009
Content-Type: text/plain; charset="UTF-8"

Hi, Andriy

Thanks for responding. I dont think we can assume there will always be a
choice for streaming or online aggregation.

The two easiest ways out would be a spark guru (ideally gurus) steeping
forward or an easier to support alternative for after the fact aggregation
over large datasets that minimally works with mysql ES and Cassandra.

-A

On Tue, Mar 19, 2019, 7:20 PM Andriy Redko <drreta@gmail.com> wrote:

> Hi Adrian,
>
> First of all, I want to confirm from the personal experiences, the
> dependencies
> are often built after the fact, so there is a real need for this kind of
> job/component.
> There are many choices, either to use the data processing engines you
> mentioned,
> or onboard the data store with aggregation capabalities (may ClickHouse
> fe). What
> do you think would be the best route for Zipkin? Keep the Spark but look
> for
> maintenance help? Or (re)write it altogether, ideally with no data engines
> needed? Just trying to understand how you envision it.
>
> Best Regards,
>     Andriy Redko
>
> AC> Hi, team.
>
> AC> A long time ago, we arbitrarily used spark for dependency link
> AC> aggregation (porting the work from Eirik's hadoop job). The initial
> AC> spark job was created incomplete then abandoned by the author. I've
> AC> tried a lot to support it, but it has been perpetual maintenance and
> AC> most of us have no idea how to support it. Yet, we get a lot of user
> AC> questions about it and the support load is higher than most of our
> AC> projects.
>
> AC> The Elasticsearch part is landmines from the "wan only" stuff, to them
> AC> having a narrow supported range of versions. It is rev-locked to a JRE
> AC> (even if will change later). We've had users complain about CVE
> AC> maintenance and actively ask for a non-spark option. General support
> AC> comes in questions about cluster distribution which no-one knows the
> AC> answer to. I've recently in desperation added a change to help show
> AC> where Spark support is.
>
> AC> https://github.com/openzipkin/zipkin-dependencies/pull/133
>
> AC> All this said, despite the problems running distributed or with
> AC> elasticsearch, most can start the zipkin-dependencies job as a
> AC> one-shot cron job without much help.
>
> AC> I think we have to be honest about the fact that since this project
> AC> started, we've rarely had anyone able to support it. I hope we can get
> AC> out of the mutually disappointing support swamp. Does anyone have any
> AC> ideas?
>
> AC> I would like to think someone could come in and save us, but seems we
> AC> should also consider other tools as that usually doesn't happen, and
> AC> one person saving us isn't sustainable (usually we need a few people
> AC> to know a tool in order to realistically support it). It is possible
> AC> to recruit for this, but we need significant close buy-in from people
> AC> who know spark imho, like actually helping with support, if we want to
> AC> continue this path.
>
> AC> I know there's a Kafka streaming option [1]. I also know some have
> AC> used Flink, and some have had interest in Pulsar. I think we should
> AC> have streaming options, but fact is many don't use any buffer like
> AC> Kafka (direct http), which leads me to think we still need an
> AC> after-the-fact option (pull from storage). Moreover spark's embedded
> AC> mode is nice as it can be treated as a dumb cron job.
>
> AC> Looking for ideas,
> AC> -A
>
> AC> [1] https://github.com/sysco-middleware/zipkin-dependencies-streaming
>
> AC> ---------------------------------------------------------------------
> AC> To unsubscribe, e-mail: dev-unsubscribe@zipkin.apache.org
> AC> For additional commands, e-mail: dev-help@zipkin.apache.org
>
>
>

--000000000000736fba058470d009--