From dev-return-24147-archive-asf-public=cust-asf.ponee.io@spark.apache.org Tue Feb 27 16:36:50 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A3748180651 for ; Tue, 27 Feb 2018 16:36:49 +0100 (CET) Received: (qmail 91986 invoked by uid 500); 27 Feb 2018 15:36:48 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 91971 invoked by uid 99); 27 Feb 2018 15:36:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Feb 2018 15:36:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E953DC1566 for ; Tue, 27 Feb 2018 15:36:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.192 X-Spam-Level: *** X-Spam-Status: No, score=3.192 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id d-l0MpQg3nDH for ; Tue, 27 Feb 2018 15:36:43 +0000 (UTC) Received: from mail-qk0-f174.google.com (mail-qk0-f174.google.com [209.85.220.174]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 306085F173 for ; Tue, 27 Feb 2018 15:36:43 +0000 (UTC) Received: by mail-qk0-f174.google.com with SMTP id f25so24032375qkm.0 for ; Tue, 27 Feb 2018 07:36:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=rvlzh+rHXC3dEQm2SCQ5wK/YTgO27sPkB+M0Ku1HaoU=; b=hSwod9SOgNxQuFWSweh+A3stQPiLbIz0YmhJJQwAxYHLuSE4oBai2r6gvHE14P/rYo gdf0X68wRb8CYD23JPKTQIzQr9T918LZsBbii2OMEjfGLSokVc5U+PLBeYqHE9+ayxOV aOqgEtUlet4Eb7VDH4fOWnaTVOyyhXNO2FWpol9Pd15ZAnd12pq3u4mI5ELDOakSgZti /AqrYUZ1xUPQqTwU0Wmva9FHgnIy71/eFU39YeWTFyYE9el0GrENq11D8Gogeq0IVPZG turwrafMQAgABKWar6N977lf/hP2tUvgOJaVzkhyhc2MWjB5Zdrlx+aSwJqxC2N79tpt ILzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=rvlzh+rHXC3dEQm2SCQ5wK/YTgO27sPkB+M0Ku1HaoU=; b=QIjd2+tstpswpw/1/H3L1yiM9b1YTmO3ah/apFcb5J34zh3IqLtzwP1VdXvHTMh5yV ueUFbOGZ3+UES/FIrJ9xkULZlb4zpEkdLKhG6X6h+oVMrnCUL+6HNvBNORYfMDmZviJ9 HS8IYiKuoP8TzFANscB/OIq4YaN63uqLyuNZap4ovJ9+ErSd9DZzqftxSYaW3D77ezUc TQcigdeebsQFIyz3egb6JskWEDrgl4BJ3oo7/DPmHdscbnigz4vOCn/7mX6CrITeqlGl jeWWAZeDJRXDUgS/uw0If/HTmRQ6pDcbd7EiXSNEG5b8aXdyob5KGJyQsnxSGM4ZJdaR JRHw== X-Gm-Message-State: APf1xPCfrYPwALKlapGYvtU61PoWVbXXRsstsE5d8IM0AOdXPdk1mVzk 6VT6kY+r5aP3EdemNgMsVTtHJ0tUc94JX1r6D8c= X-Google-Smtp-Source: AG47ELtBKiTQoqMGBhobRhuSKBZkd7L9DpTgK/+5oInsY9JagkrwjlQbNra8EvsAq5Hbg6jY5Eq+upOrvZdskr4DEs0= X-Received: by 10.55.168.4 with SMTP id r4mr22581814qke.311.1519745796668; Tue, 27 Feb 2018 07:36:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.8.145 with HTTP; Tue, 27 Feb 2018 07:36:06 -0800 (PST) In-Reply-To: References: From: Michael Heuer Date: Tue, 27 Feb 2018 09:36:06 -0600 Message-ID: Subject: Re: Please keep s3://spark-related-packages/ alive To: Sean Owen Cc: Nicholas Chammas , Spark dev list Content-Type: multipart/alternative; boundary="94eb2c0654fc5c034a0566336423" --94eb2c0654fc5c034a0566336423 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Feb 27, 2018 at 8:17 AM, Sean Owen wrote: > See http://apache-spark-developers-list.1001551.n3.nabble.com/What-is- > d3kbcqa49mib13-cloudfront-net-td22427.html -- it was 'retired', yes. > > Agree with all that, though they're intended for occasional individual us= e > and not a case where performance and uptime matter. For that, I think you= 'd > want to just host your own copy of the bits you need. > > The notional problem was that the S3 bucket wasn't obviously > controlled/blessed by the ASF and yet was a source of official bits. It w= as > another set of third-party credentials to hand around to release managers= , > which was IIRC a little problematic. > > Homebrew does host distributions of ASF projects, like Spark, FWIW. > To clarify, the apache-spark.rb formula in Homebrew uses the Apache mirror closer.lua script https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-spark.= rb#L4 michael > On Mon, Feb 26, 2018 at 10:57 PM Nicholas Chammas < > nicholas.chammas@gmail.com> wrote: > >> If you go to the Downloads page >> and download Spark 2.2.1, you=E2=80=99ll get a link to an Apache mirror.= It didn=E2=80=99t >> use to be this way. As recently as Spark 2.2.0, downloads were served vi= a >> CloudFront , which was backed by an >> S3 bucket named spark-related-packages. >> >> It seems that we=E2=80=99ve stopped using CloudFront, and the S3 bucket = behind it >> has stopped receiving updates (e.g. Spark 2.2.1 isn=E2=80=99t there). I= =E2=80=99m guessing >> this is part of an effort to use the Apache mirror network, like other >> Apache projects do. >> >> From a user perspective, the Apache mirror network is several steps down >> from using a modern CDN. Let me summarize why: >> >> 1. *Apache mirrors are often slow.* Apache does not impose any >> performance requirements on its mirrors >> . >> The difference between getting a good mirror and a bad one means >> downloading Spark in less than a minute vs. 20 minutes. The problem i= s so >> bad that I=E2=80=99ve thought about adding an Apache mirror blacklist >> >> to Flintrock to avoid getting one of these dud mirrors. >> 2. *Apache mirrors are inconvenient to use.* When you download >> something from an Apache mirror, you get a link like this one >> . >> Instead of automatically redirecting you to your download, though, yo= u need >> to process the results you get back >> >> to find your download target. And you need to handle the high downloa= d >> failure rate, since sometimes the mirror you get doesn=E2=80=99t have= the file it >> claims to have. >> 3. *Apache mirrors are incomplete.* Apache mirrors only keep around >> the latest releases, save for a few =E2=80=9Carchive=E2=80=9D mirrors= , which are often >> slow. So if you want to download anything but the latest version of S= park, >> you are out of luck. >> >> Some of these problems can be mitigated by picking a specific mirror tha= t >> works well and hardcoding it in your scripts, but that defeats the purpo= se >> of dynamically selecting a mirror and makes you a =E2=80=9Cbad=E2=80=9D = user of the mirror >> network. >> >> I raised some of these issues over on INFRA-10999 >> . The ticket sat for >> a year before I heard anything back, and the bottom line was that none o= f >> the above problems have a solution on the horizon. It=E2=80=99s fine. I = understand >> that Apache is a volunteer organization and that the infrastructure team >> has a lot to manage as it is. I still find it disappointing that an >> organization of Apache=E2=80=99s stature doesn=E2=80=99t have a better s= olution for this in >> collaboration with a third party. Python serves PyPI downloads using >> Fastly and Homebrew serves packages using >> Bintray . They both work really, really well. Why >> don=E2=80=99t we have something as good for Apache projects? Anyway, tha= t=E2=80=99s a >> separate discussion. >> >> What I want to say is this: >> >> Dear whoever owns the spark-related-packages S3 bucket >> , >> >> Please keep the bucket up-to-date with the latest Spark releases, >> alongside the past releases that are already on there. It=E2=80=99s a hu= ge help to >> the Flintrock project, and it=E2= =80=99s >> an equally big help to those of us writing infrastructure automation >> scripts that deploy Spark in other contexts. >> >> I understand that hosting this stuff is not free, and that I am not >> paying anything for this service. If it needs to go, so be it. But I wan= ted >> to take this opportunity to lay out the benefits I=E2=80=99ve enjoyed th= anks to >> having this bucket around, and to make sure that if it did die, it didn= =E2=80=99t >> die a quiet death. >> >> Sincerely, >> Nick >> =E2=80=8B >> > --94eb2c0654fc5c034a0566336423 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, Feb 27, 2018 at 8:17 AM, Sean Owen <srowen@gmail.c= om> wrote:
S= ee=C2=A0htt= p://apache-spark-developers-list.1001551.n3.nabble.com/What-is-d3kbcqa49mib13-cloudfront-net-td22427.html=C2=A0-- it was '= retired', yes.

Agree with all that, though they'= re intended for occasional individual use and not a case where performance = and uptime matter. For that, I think you'd want to just host your own c= opy of the bits you need.=C2=A0

The notional probl= em was that the S3 bucket wasn't obviously controlled/blessed by the AS= F and yet was a source of official bits. It was another set of third-party = credentials to hand around to release managers, which was IIRC a little pro= blematic.

Homebrew does host distributions of ASF = projects, like Spark, FWIW.=C2=A0

To clarify, the apache-spark.rb formula in Homebrew uses the Apache mirror closer.lua script


=C2=A0=C2= =A0 michael

=C2=A0
<= div>On Mon, Feb 26, 2018 at 10:57 PM Nicholas Chammas <nicholas.chammas@gmail.com> wrote:

If you go to the Downloads page and download Spark 2.2.1, you=E2=80=99= ll get a link to an Apache mirror. It didn=E2=80=99t use to be this way. As= recently as Spark 2.2.0, downloads were served via CloudFront, which was backed = by an S3 bucket named spar= k-related-packages.

It seems that we=E2=80=99ve stopped using= CloudFront, and the S3 bucket behind it has stopped receiving updates (e.g= . Spark 2.2.1 isn=E2=80=99t there). I=E2=80=99m guessing this is part of an= effort to use the Apache mirror network, like other Apache projects do.

From a user perspective, the Apache mirro= r network is several steps down from using a modern CDN. Let me summarize w= hy:

  1. Apache mirrors are often slow. Apache= does not impose any per= formance requirements on its mirrors. The difference between getting a = good mirror and a bad one means downloading Spark in less than a minute vs.= 20 minutes. The problem is so bad that I=E2=80=99ve thought about adding an Apache mirror blacklist to Flintrock to av= oid getting one of these dud mirrors.
  2. Apache mirrors are inconvenient to use. When you download something from an Apache mirror, you get a link like this one. Instead of automatica= lly redirecting you to your download, though, you need to process = the results you get back to find your download target. And you need to = handle the high download failure rate, since sometimes the mirror you get d= oesn=E2=80=99t have the file it claims to have.
  3. Apache mirrors are incomplete. Apache= mirrors only keep around the latest releases, save for a few =E2=80=9Carch= ive=E2=80=9D mirrors, which are often slow. So if you want to download anyt= hing but the latest version of Spark, you are out of luck.

Some of these problems can be mitigated b= y picking a specific mirror that works well and hardcoding it in your scrip= ts, but that defeats the purpose of dynamically selecting a mirror and make= s you a =E2=80=9Cbad=E2=80=9D user of the mirror network.

I raised some of these issues over on INFRA-10999. The ticket sat for a year before I heard anything back, = and the bottom line was that none of the above problems have a solution on = the horizon. It=E2=80=99s fine. I understand that Apache is a volunteer org= anization and that the infrastructure team has a lot to manage as it is. I = still find it disappointing that an organization of Apache=E2=80=99s statur= e doesn=E2=80=99t have a better solution for this in collaboration with a t= hird party. Python serves PyPI downloads using Fastly and Homebrew serves packages using Bintray. They both work= really, really well. Why don=E2=80=99t we have something as good for Apach= e projects? Anyway, that=E2=80=99s a separate discussion.

What I want to say is this:

Dear whoever owns the spark-related-packages S3 bucket,

Please keep the bucket up-to-date with th= e latest Spark releases, alongside the past releases that are already on th= ere. It=E2=80=99s a huge help to the Flintrock project, and it=E2=80=99s an eq= ually big help to those of us writing infrastructure automation scripts tha= t deploy Spark in other contexts.

I understand that hosting this stuff is n= ot free, and that I am not paying anything for this service. If it needs to= go, so be it. But I wanted to take this opportunity to lay out the benefit= s I=E2=80=99ve enjoyed thanks to having this bucket around, and to make sur= e that if it did die, it didn=E2=80=99t die a quiet death.

Sincerely,
Nick

=E2=80=8B

--94eb2c0654fc5c034a0566336423--