Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C8FB917AC7 for ; Tue, 5 May 2015 05:36:01 +0000 (UTC) Received: (qmail 85918 invoked by uid 500); 5 May 2015 05:36:00 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 85829 invoked by uid 500); 5 May 2015 05:36:00 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 85818 invoked by uid 99); 5 May 2015 05:35:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 May 2015 05:35:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which is an MX secondary for dev@spark.apache.org) Received: from [54.76.25.247] (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 May 2015 05:35:34 +0000 Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id F1AE4207C4 for ; Tue, 5 May 2015 05:35:32 +0000 (UTC) Received: by wicmx19 with SMTP id mx19so91986369wic.1 for ; Mon, 04 May 2015 22:35:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=vopKLgzKB5xDAvbXU3x/CHEfwG9eONQvuP7yJ/zwhYQ=; b=XLvIIXWhvPKmQZbFK3qWKC1YCKwjvekjxCISL+Zgeomwk2vxbWanXM7zOx1yOrqX+m jKL6hAltEjJw8PHbKqArUAVAZcO611laBiwAuvZOK0jPuSQH+M18YCe3+PHzEZKEXgIZ 5p5+HddDHAhM0x+mDHizxfQlQ7+L7kuEE6JE5KLCbC0DisxnY/6XiVgCi+a6ss2odntx SROI3tFxhBK3YCpJIdWxKwozfUPMwf2yna2Y7TzeT86me0laIoL9lLXtCETl5a9mv303 wC3gzXgbmXDCFDNxHqTbQ9lFaB58CZj/s/y2VUj9/FKZLddLVH9aXpMpdIn5duLDBVhF z9Fw== X-Gm-Message-State: ALoCoQk17E81um/WYMGyyFAzv3nX5UDy7O1L7c8r1351uv4s0xMqiaKMKa1j5IC/Z7tTYaKotWUK X-Received: by 10.194.60.67 with SMTP id f3mr47997782wjr.28.1430804132684; Mon, 04 May 2015 22:35:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.174.37 with HTTP; Mon, 4 May 2015 22:35:02 -0700 (PDT) In-Reply-To: References: <9D5B00849D2CDA4386BDA89E83F69E6C0FE6297F@G9W0737.americas.hpqcorp.net> <55476AA8.4030509@flytxt.com> From: Tathagata Das Date: Mon, 4 May 2015 22:35:02 -0700 Message-ID: Subject: Re: Speeding up Spark build during development To: Michael Armbrust Cc: Meethu Mathew , "dev@spark.apache.org" Content-Type: multipart/alternative; boundary=047d7ba97a7012724505154f0c0a X-Virus-Checked: Checked by ClamAV on apache.org --047d7ba97a7012724505154f0c0a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In addition to Michael suggestion, in my SBT workflow I also use "~" to automatically kickoff build and unit test. For example, sbt/sbt "~streaming/test-only *BasicOperationsSuite*" It will automatically detect any file changes in the project and start of the compilation and testing. So my full workflow involves changing code in IntelliJ and then continuously running unit tests in the background on the command line using this "~". TD On Mon, May 4, 2015 at 2:49 PM, Michael Armbrust wrote: > FWIW... My Spark SQL development workflow is usually to run "build/sbt > sparkShell" or "build/sbt 'sql/test-only '". These comman= ds > starts in as little as 30s on my laptop, automatically figure out which > subprojects need to be rebuilt, and don't require the expensive assembly > creation. > > On Mon, May 4, 2015 at 5:48 AM, Meethu Mathew > wrote: > > > * > > * > > ** ** ** ** ** **** ** **** Hi, > > > > Is it really necessary to run **mvn --projects assembly/ -DskipTests > > install ? Could you please explain why this is needed? > > I got the changes after running "mvn --projects streaming/ -DskipTests > > package". > > > > Regards, > > Meethu > > > > > > On Monday 04 May 2015 02:20 PM, Emre Sevinc wrote: > > > >> Just to give you an example: > >> > >> When I was trying to make a small change only to the Streaming compone= nt > >> of > >> Spark, first I built and installed the whole Spark project (this took > >> about > >> 15 minutes on my 4-core, 4 GB RAM laptop). Then, after having changed > >> files > >> only in Streaming, I ran something like (in the top-level directory): > >> > >> mvn --projects streaming/ -DskipTests package > >> > >> and then > >> > >> mvn --projects assembly/ -DskipTests install > >> > >> > >> This was much faster than trying to build the whole Spark from scratch= , > >> because Maven was only building one component, in my case the Streamin= g > >> component, of Spark. I think you can use a very similar approach. > >> > >> -- > >> Emre Sevin=C3=A7 > >> > >> > >> > >> On Mon, May 4, 2015 at 10:44 AM, Pramod Biligiri < > >> pramodbiligiri@gmail.com> > >> wrote: > >> > >> No, I just need to build one project at a time. Right now SparkSql. > >>> > >>> Pramod > >>> > >>> On Mon, May 4, 2015 at 12:09 AM, Emre Sevinc > >>> wrote: > >>> > >>> Hello Pramod, > >>>> > >>>> Do you need to build the whole project every time? Generally you > don't, > >>>> e.g., when I was changing some files that belong only to Spark > >>>> Streaming, I > >>>> was building only the streaming (of course after having build and > >>>> installed > >>>> the whole project, but that was done only once), and then the > assembly. > >>>> This was much faster than trying to build the whole Spark every time= . > >>>> > >>>> -- > >>>> Emre Sevin=C3=A7 > >>>> > >>>> On Mon, May 4, 2015 at 9:01 AM, Pramod Biligiri < > >>>> pramodbiligiri@gmail.com > >>>> > >>>>> wrote: > >>>>> Using the inbuilt maven and zinc it takes around 10 minutes for eac= h > >>>>> build. > >>>>> Is that reasonable? > >>>>> My maven opts looks like this: > >>>>> $ echo $MAVEN_OPTS > >>>>> -Xmx12000m -XX:MaxPermSize=3D2048m > >>>>> > >>>>> I'm running it as build/mvn -DskipTests package > >>>>> > >>>>> Should I be tweaking my Zinc/Nailgun config? > >>>>> > >>>>> Pramod > >>>>> > >>>>> On Sun, May 3, 2015 at 3:40 PM, Mark Hamstra < > mark@clearstorydata.com> > >>>>> wrote: > >>>>> > >>>>> > >>>>>> > >>>>> > https://spark.apache.org/docs/latest/building-spark.html#building-with-bu= ildmvn > >>>>> > >>>>>> On Sun, May 3, 2015 at 2:54 PM, Pramod Biligiri < > >>>>>> > >>>>> pramodbiligiri@gmail.com> > >>>>> > >>>>>> wrote: > >>>>>> > >>>>>> This is great. I didn't know about the mvn script in the build > >>>>>>> > >>>>>> directory. > >>>>> > >>>>>> Pramod > >>>>>>> > >>>>>>> On Fri, May 1, 2015 at 9:51 AM, York, Brennon < > >>>>>>> Brennon.York@capitalone.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Following what Ted said, if you leverage the `mvn` from within t= he > >>>>>>>> `build/` directory of Spark you=C2=B9ll get zinc for free which = should > >>>>>>>> > >>>>>>> help > >>>>> > >>>>>> speed up build times. > >>>>>>>> > >>>>>>>> On 5/1/15, 9:45 AM, "Ted Yu" wrote: > >>>>>>>> > >>>>>>>> Pramod: > >>>>>>>>> Please remember to run Zinc so that the build is faster. > >>>>>>>>> > >>>>>>>>> Cheers > >>>>>>>>> > >>>>>>>>> On Fri, May 1, 2015 at 9:36 AM, Ulanov, Alexander > >>>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hi Pramod, > >>>>>>>>>> > >>>>>>>>>> For cluster-like tests you might want to use the same code as = in > >>>>>>>>>> > >>>>>>>>> mllib's > >>>>>>> > >>>>>>>> LocalClusterSparkContext. You can rebuild only the package that > >>>>>>>>>> > >>>>>>>>> you > >>>>> > >>>>>> change > >>>>>>>>>> and then run this main class. > >>>>>>>>>> > >>>>>>>>>> Best regards, Alexander > >>>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: Pramod Biligiri [mailto:pramodbiligiri@gmail.com] > >>>>>>>>>> Sent: Friday, May 01, 2015 1:46 AM > >>>>>>>>>> To: dev@spark.apache.org > >>>>>>>>>> Subject: Speeding up Spark build during development > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> I'm making some small changes to the Spark codebase and trying > >>>>>>>>>> > >>>>>>>>> it out > >>>>> > >>>>>> on a > >>>>>>>>>> cluster. I was wondering if there's a faster way to build than > >>>>>>>>>> > >>>>>>>>> running > >>>>>>> > >>>>>>>> the > >>>>>>>>>> package target each time. > >>>>>>>>>> Currently I'm using: mvn -DskipTests package > >>>>>>>>>> > >>>>>>>>>> All the nodes have the same filesystem mounted at the same mou= nt > >>>>>>>>>> > >>>>>>>>> point. > >>>>>>> > >>>>>>>> Pramod > >>>>>>>>>> > >>>>>>>>>> ________________________________________________________ > >>>>>>>> > >>>>>>>> The information contained in this e-mail is confidential and/or > >>>>>>>> proprietary to Capital One and/or its affiliates. The informatio= n > >>>>>>>> transmitted herewith is intended only for use by the individual = or > >>>>>>>> > >>>>>>> entity > >>>>>>> > >>>>>>>> to which it is addressed. If the reader of this message is not > the > >>>>>>>> intended recipient, you are hereby notified that any review, > >>>>>>>> retransmission, dissemination, distribution, copying or other us= e > >>>>>>>> > >>>>>>> of, or > >>>>> > >>>>>> taking of any action in reliance upon this information is strictly > >>>>>>>> prohibited. If you have received this communication in error, > please > >>>>>>>> contact the sender and delete the material from your computer. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >>>> -- > >>>> Emre Sevinc > >>>> > >>>> > >>> > >> > > > --047d7ba97a7012724505154f0c0a--