zipkin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Willem Jiang <>
Subject Re: Perpetual support problems using Spark for dependency link aggregation
Date Tue, 19 Mar 2019 03:33:53 GMT
Hi Adrian,

Thanks for the briefing of the support problem of Sparks.
We could always ask help from the community by providing enough
context information.
Maybe we can add a page in the zipkin wiki page and write twitter
about it, to see if we could attract some contributors.
>From my experience,  if we have a great idea, it won't take a long
time to find help from the open source community.


Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Tue, Mar 19, 2019 at 8:51 AM Adrian Cole <> wrote:
> Hi, team.
> A long time ago, we arbitrarily used spark for dependency link
> aggregation (porting the work from Eirik's hadoop job). The initial
> spark job was created incomplete then abandoned by the author. I've
> tried a lot to support it, but it has been perpetual maintenance and
> most of us have no idea how to support it. Yet, we get a lot of user
> questions about it and the support load is higher than most of our
> projects.
> The Elasticsearch part is landmines from the "wan only" stuff, to them
> having a narrow supported range of versions. It is rev-locked to a JRE
> (even if will change later). We've had users complain about CVE
> maintenance and actively ask for a non-spark option. General support
> comes in questions about cluster distribution which no-one knows the
> answer to. I've recently in desperation added a change to help show
> where Spark support is.
> All this said, despite the problems running distributed or with
> elasticsearch, most can start the zipkin-dependencies job as a
> one-shot cron job without much help.
> I think we have to be honest about the fact that since this project
> started, we've rarely had anyone able to support it. I hope we can get
> out of the mutually disappointing support swamp. Does anyone have any
> ideas?
> I would like to think someone could come in and save us, but seems we
> should also consider other tools as that usually doesn't happen, and
> one person saving us isn't sustainable (usually we need a few people
> to know a tool in order to realistically support it). It is possible
> to recruit for this, but we need significant close buy-in from people
> who know spark imho, like actually helping with support, if we want to
> continue this path.
> I know there's a Kafka streaming option [1]. I also know some have
> used Flink, and some have had interest in Pulsar. I think we should
> have streaming options, but fact is many don't use any buffer like
> Kafka (direct http), which leads me to think we still need an
> after-the-fact option (pull from storage). Moreover spark's embedded
> mode is nice as it can be treated as a dumb cron job.
> Looking for ideas,
> -A
> [1]
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message