zipkin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andriy Redko <>
Subject Re: Perpetual support problems using Spark for dependency link aggregation
Date Wed, 20 Mar 2019 12:24:58 GMT
Hi Adrian,

Got it, I am drafting an email to send to Spark users list, asking for help, there are chances

that someone either steps in or pass the message around, so we could at least try to attract

some contributors. Do you think it makes sense? Thanks!

Best Regards,
    Andriy Redko

AC> Hi, Andriy

AC> Thanks for responding. I dont think we can assume there will always be a choice for
streaming or online aggregation.

AC> The two easiest ways out would be a spark guru (ideally gurus) steeping forward or
an easier to support alternative
AC> for after the fact aggregation over large datasets that minimally works with mysql
ES and Cassandra.

AC> -A
AC> On Tue, Mar 19, 2019, 7:20 PM Andriy Redko <> wrote:

AC> Hi Adrian,

AC>  First of all, I want to confirm from the personal experiences, the dependencies
AC>  are often built after the fact, so there is a real need for this kind of job/component.

AC>  There are many choices, either to use the data processing engines you mentioned,
AC>  or onboard the data store with aggregation capabalities (may ClickHouse fe). What
AC>  do you think would be the best route for Zipkin? Keep the Spark but look for
AC>  maintenance help? Or (re)write it altogether, ideally with no data engines
AC>  needed? Just trying to understand how you envision it.

AC>  Best Regards,
AC>      Andriy Redko

 AC>> Hi, team.

 AC>> A long time ago, we arbitrarily used spark for dependency link
 AC>> aggregation (porting the work from Eirik's hadoop job). The initial
 AC>> spark job was created incomplete then abandoned by the author. I've
 AC>> tried a lot to support it, but it has been perpetual maintenance and
 AC>> most of us have no idea how to support it. Yet, we get a lot of user
 AC>> questions about it and the support load is higher than most of our
 AC>> projects.

 AC>> The Elasticsearch part is landmines from the "wan only" stuff, to them
 AC>> having a narrow supported range of versions. It is rev-locked to a JRE
 AC>> (even if will change later). We've had users complain about CVE
 AC>> maintenance and actively ask for a non-spark option. General support
 AC>> comes in questions about cluster distribution which no-one knows the
 AC>> answer to. I've recently in desperation added a change to help show
 AC>> where Spark support is.


 AC>> All this said, despite the problems running distributed or with
 AC>> elasticsearch, most can start the zipkin-dependencies job as a
 AC>> one-shot cron job without much help.

 AC>> I think we have to be honest about the fact that since this project
 AC>> started, we've rarely had anyone able to support it. I hope we can get
 AC>> out of the mutually disappointing support swamp. Does anyone have any
 AC>> ideas?

 AC>> I would like to think someone could come in and save us, but seems we
 AC>> should also consider other tools as that usually doesn't happen, and
 AC>> one person saving us isn't sustainable (usually we need a few people
 AC>> to know a tool in order to realistically support it). It is possible
 AC>> to recruit for this, but we need significant close buy-in from people
 AC>> who know spark imho, like actually helping with support, if we want to
 AC>> continue this path.

 AC>> I know there's a Kafka streaming option [1]. I also know some have
 AC>> used Flink, and some have had interest in Pulsar. I think we should
 AC>> have streaming options, but fact is many don't use any buffer like
 AC>> Kafka (direct http), which leads me to think we still need an
 AC>> after-the-fact option (pull from storage). Moreover spark's embedded
 AC>> mode is nice as it can be treated as a dumb cron job.

 AC>> Looking for ideas,
 AC>> -A

 AC>> [1]

 AC>> ---------------------------------------------------------------------
 AC>> To unsubscribe, e-mail:
 AC>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message