Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFCF01770E for ; Sat, 11 Jul 2015 18:07:47 +0000 (UTC) Received: (qmail 777 invoked by uid 500); 11 Jul 2015 18:07:46 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 688 invoked by uid 500); 11 Jul 2015 18:07:46 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 677 invoked by uid 99); 11 Jul 2015 18:07:46 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Jul 2015 18:07:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A1CA81A6DB4 for ; Sat, 11 Jul 2015 18:07:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gI3atwHhoWQT for ; Sat, 11 Jul 2015 18:07:39 +0000 (UTC) Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id E393520EFB for ; Sat, 11 Jul 2015 18:07:38 +0000 (UTC) Received: by obbop1 with SMTP id op1so208555726obb.2 for ; Sat, 11 Jul 2015 11:07:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=B5XfrjPB3afYk+jTPC1AFOWAWAgU2ODwKdLqTGAI4TQ=; b=oB3J6E/C2VZyBulEPIJE0yYmZ9vBqYZU7Z3ad70AgriUiqC8Fsa9J56dBeLz4u0HPY OPNfsysn+3R24uuEKdMz8R0gGcD2CIIEqImM2RTH9u8ZpRIohDPfud5HkUcfwraUlczD QENEDnXlVrgAMFqZvz14dv2boCWwGyKiKWEWAE4K1/u+Bp0qAy4DTHcqBLYNzUh6XQqQ ACrOQqnwP4qQ2JMRrGEw9AE9C0bQ+ValqJfiQWdiEKnxNxkJCf12R1B7hgZjPB9zthRw a9lfr7eZBnL3ojJZ/QAhxe5UagwEgwNWwmoAgxDcPZ1jA8gp5k5zn77L4i6QSoy8Np8g HfnQ== X-Received: by 10.60.142.170 with SMTP id rx10mr23601266oeb.28.1436638052212; Sat, 11 Jul 2015 11:07:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.73.197 with HTTP; Sat, 11 Jul 2015 11:07:12 -0700 (PDT) From: Matt Goodman Date: Sat, 11 Jul 2015 11:07:12 -0700 Message-ID: Subject: Re: Should spark-ec2 get its own repo? To: dev@spark.apache.org Content-Type: multipart/alternative; boundary=047d7b163515c5e928051a9d5c42 --047d7b163515c5e928051a9d5c42 Content-Type: text/plain; charset=UTF-8 I wanted to revive the conversation about the spark-ec2 tools, as it seems to have been lost in the 1.4.1 release voting spree. I think that splitting it into its own repository is a really good move, and I would also be happy to help with this transition, as well as help maintain the resulting repository. Here is my justification for why we ought to do this split. User Facing: - The spark-ec2 launcher dosen't use anything in the parent spark repository - spark-ec2 version is disjoint from the parent repo. I consider it confusing that the spark-ec2 script dosen't launch the version of spark it is checked-out with. - Someone interested in setting up spark-ec2 with anything but the default configuration will have to clone at least 2 repositories at present, and probably fork and push changes to 1. - spark-ec2 has mismatched dependencies wrt. to spark itself. This includes a confusing shim in the spark-ec2 script to install boto, which frankly should just be a dependency of the script Developer Facing: - Support across 2 repos will be worse than across 1. Its unclear where to file issues/PRs, and requires extra communications for even fairly trivial stuff. - Spark-ec2 also depends on a number binary blobs being in the right place, currently the responsibility for these is decentralized, and likely prone to various flavors of dumb. - The current flow of booting a spark-ec2 cluster is _complicated_ I spent the better part of a couple days figuring out how to integrate our custom tools into this stack. This is very hard to fix when commits/PR's need to span groups/repositories/buckets-o-binary, I am sure there are several other problems that are languishing under similar roadblocks - It makes testing possible. The spark-ec2 script is a great case for CI given the number of permutations of launch criteria there are. I suspect AWS would be happy to foot the bill on spark-ec2 testing (probably ~20 bucks a month based on some envelope sketches), as it is a piece of software that directly impacts other people giving them money. I have some contacts there, and I am pretty sure this would be an easy conversation, particularly if the repo directly concerned with ec2. Think also being able to assemble the binary blobs into s3 bucket dedicated to spark-ec2 Any other thoughts/voices appreciated here. spark-ec2 is a super-power tool and deserves a fair bit of attention! --Matthew Goodman ===================== Check Out My Website: http://craneium.net Find me on LinkedIn: http://tinyurl.com/d6wlch --047d7b163515c5e928051a9d5c42 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I wanted to revive the conversation about the spark-ec2 to= ols, as it seems to have been lost in the 1.4.1 release voting spree.
<= br>
I think that splitting it into its own repository is a really= good move, and I would also be happy to help with this transition, as well= as help maintain the resulting repository.=C2=A0 Here is my justification = for why we ought to do this split.

User Facing:
  • The spark-ec2 launcher dosen't use anything in the pare= nt spark repository
  • spark-ec2 version is disjoint from the parent r= epo.=C2=A0 I consider it confusing that the spark-ec2 script dosen't la= unch the version of spark it is checked-out with.
  • Someone intereste= d in setting up spark-ec2 with anything but the default configuration will = have to clone at least 2 repositories at present, and probably fork and pus= h changes to 1.
  • spark-ec2 has mismatched dependencies wrt. to spark= itself.=C2=A0 This includes a confusing shim in the spark-ec2 script to in= stall boto, which frankly should just be a dependency of the script
  • Developer Facing:
    • Support across 2 repos will be worse than ac= ross 1.=C2=A0 Its unclear where to file issues/PRs, and requires extra comm= unications for even fairly trivial stuff.
    • Spark-ec2 also depends on= a number binary blobs being in the right place, currently the responsibili= ty for these is decentralized, and likely prone to various flavors of dumb.=
    • The current flow of booting a spark-ec2 cluster is _complicated_ I= spent the better part of a couple days figuring out how to integrate our c= ustom tools into this stack.=C2=A0 This is very hard to fix when commits/PR= 's need to span groups/repositories/buckets-o-binary, I am sure there a= re several other problems that are languishing under similar roadblocks
    • It makes testing possible.=C2=A0 The spark-ec2 script is a great case = for CI given the number of permutations of launch criteria there are.=C2=A0= I suspect AWS would be happy to foot the bill on spark-ec2 testing (probab= ly ~20 bucks a month based on some envelope sketches), as it is a piece of = software that directly impacts other people giving them money.=C2=A0 I have= some contacts there, and I am pretty sure this would be an easy conversati= on, particularly if the repo directly concerned with ec2.=C2=A0 Think also = being able to assemble the binary blobs into s3 bucket dedicated to spark-e= c2 =C2=A0
    Any other thoughts/v= oices appreciated here. =C2=A0spark-ec2 is a super-power tool and deserves = a fair bit of attention!
    --Matthew Good= man

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<= br>Check Out My Website: = http://craneium.net
    Find me on LinkedIn: http://tinyurl.com/d6wlch
--047d7b163515c5e928051a9d5c42--