Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which
 is an MX secondary for dev@spark.apache.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAAswR-5LWT9mRJbxrCPzxmYq1hbvVs8-ik-6qA8rwt3BfunNqg@mail.gmail.com>
References: 
 <CA+UMGiHuz9bk8QQwjMax7XkovKJK5A+ETcG1zAGPvCv7K-wSUg@mail.gmail.com>
 <9D5B00849D2CDA4386BDA89E83F69E6C0FE6297F@G9W0737.americas.hpqcorp.net>
 <CALte62wtNbnQdtij3=3wyYsUUO7WQmGevj4d9rukABE5J5jpcw@mail.gmail.com>
 <D168FCD3.1EBC9%brennon.york@capitalone.com>
 <CA+UMGiEAmAH7O3-WYqUoCQAZHOB=7ucDQ_w8dRoz-+=Ke5c+wA@mail.gmail.com>
 <CAAsvFPk=OxCSOsBbF56ozSSmkNRmpjGnBXubcXEgU8EdmubEow@mail.gmail.com>
 <CA+UMGiFvqamuSD=Eb7siXkt5yKpGRBRwDpcbEb0umNAzTQjsKg@mail.gmail.com>
 <CAEgyCibd17KOXS-WAvq9MUOa9xO4q7qJTSZ-prVpK2eitNd6Kg@mail.gmail.com>
 <CA+UMGiFox2ZNJWy2pj=qZvrVnkPkYema11WUSQD8qsT84QhN7w@mail.gmail.com>
 <CAEgyCiahygn+p5KD8rkNPQa81xPNcDXgZjXC5w2YpUuayMxT0g@mail.gmail.com>
 <55476AA8.4030509@flytxt.com>
 <CAAswR-5LWT9mRJbxrCPzxmYq1hbvVs8-ik-6qA8rwt3BfunNqg@mail.gmail.com>
From: Tathagata Das <tdas@databricks.com>
Date: Mon, 4 May 2015 22:35:02 -0700
Message-ID: 
 <CA+AHuK=qGDWAkG11AJg=KyOiUT68DK0POr0pjnXopoYjSEJjPg@mail.gmail.com>
Subject: Re: Speeding up Spark build during development
To: Michael Armbrust <michael@databricks.com>
Cc: Meethu Mathew <meethu.mathew@flytxt.com>,
 "dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=047d7ba97a7012724505154f0c0a

--047d7ba97a7012724505154f0c0a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

In addition to Michael suggestion, in my SBT workflow I also use "~" to
automatically kickoff build and unit test. For example,

sbt/sbt "~streaming/test-only *BasicOperationsSuite*"

It will automatically detect any file changes in the project and start of
the compilation and testing.
So my full workflow involves changing code in IntelliJ and then
continuously running unit tests in the background on the command line using
this "~".

TD


On Mon, May 4, 2015 at 2:49 PM, Michael Armbrust <michael@databricks.com>
wrote:

> FWIW... My Spark SQL development workflow is usually to run "build/sbt
> sparkShell" or "build/sbt 'sql/test-only <testSuiteName>'".  These comman=
ds
> starts in as little as 30s on my laptop, automatically figure out which
> subprojects need to be rebuilt, and don't require the expensive assembly
> creation.
>
> On Mon, May 4, 2015 at 5:48 AM, Meethu Mathew <meethu.mathew@flytxt.com>
> wrote:
>
> > *
> > *
> > ** ** ** ** ** **** ** **** Hi,
> >
> >  Is it really necessary to run **mvn --projects assembly/ -DskipTests
> > install ? Could you please explain why this is needed?
> > I got the changes after running "mvn --projects streaming/ -DskipTests
> > package".
> >
> > Regards,
> > Meethu
> >
> >
> > On Monday 04 May 2015 02:20 PM, Emre Sevinc wrote:
> >
> >> Just to give you an example:
> >>
> >> When I was trying to make a small change only to the Streaming compone=
nt
> >> of
> >> Spark, first I built and installed the whole Spark project (this took
> >> about
> >> 15 minutes on my 4-core, 4 GB RAM laptop). Then, after having changed
> >> files
> >> only in Streaming, I ran something like (in the top-level directory):
> >>
> >>     mvn --projects streaming/ -DskipTests package
> >>
> >> and then
> >>
> >>     mvn --projects assembly/ -DskipTests install
> >>
> >>
> >> This was much faster than trying to build the whole Spark from scratch=
,
> >> because Maven was only building one component, in my case the Streamin=
g
> >> component, of Spark. I think you can use a very similar approach.
> >>
> >> --
> >> Emre Sevin=C3=A7
> >>
> >>
> >>
> >> On Mon, May 4, 2015 at 10:44 AM, Pramod Biligiri <
> >> pramodbiligiri@gmail.com>
> >> wrote:
> >>
> >>  No, I just need to build one project at a time. Right now SparkSql.
> >>>
> >>> Pramod
> >>>
> >>> On Mon, May 4, 2015 at 12:09 AM, Emre Sevinc <emre.sevinc@gmail.com>
> >>> wrote:
> >>>
> >>>  Hello Pramod,
> >>>>
> >>>> Do you need to build the whole project every time? Generally you
> don't,
> >>>> e.g., when I was changing some files that belong only to Spark
> >>>> Streaming, I
> >>>> was building only the streaming (of course after having build and
> >>>> installed
> >>>> the whole project, but that was done only once), and then the
> assembly.
> >>>> This was much faster than trying to build the whole Spark every time=
.
> >>>>
> >>>> --
> >>>> Emre Sevin=C3=A7
> >>>>
> >>>> On Mon, May 4, 2015 at 9:01 AM, Pramod Biligiri <
> >>>> pramodbiligiri@gmail.com
> >>>>
> >>>>> wrote:
> >>>>> Using the inbuilt maven and zinc it takes around 10 minutes for eac=
h
> >>>>> build.
> >>>>> Is that reasonable?
> >>>>> My maven opts looks like this:
> >>>>> $ echo $MAVEN_OPTS
> >>>>> -Xmx12000m -XX:MaxPermSize=3D2048m
> >>>>>
> >>>>> I'm running it as build/mvn -DskipTests package
> >>>>>
> >>>>> Should I be tweaking my Zinc/Nailgun config?
> >>>>>
> >>>>> Pramod
> >>>>>
> >>>>> On Sun, May 3, 2015 at 3:40 PM, Mark Hamstra <
> mark@clearstorydata.com>
> >>>>> wrote:
> >>>>>
> >>>>>
> >>>>>>
> >>>>>
> https://spark.apache.org/docs/latest/building-spark.html#building-with-bu=
ildmvn
> >>>>>
> >>>>>> On Sun, May 3, 2015 at 2:54 PM, Pramod Biligiri <
> >>>>>>
> >>>>> pramodbiligiri@gmail.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  This is great. I didn't know about the mvn script in the build
> >>>>>>>
> >>>>>> directory.
> >>>>>
> >>>>>> Pramod
> >>>>>>>
> >>>>>>> On Fri, May 1, 2015 at 9:51 AM, York, Brennon <
> >>>>>>> Brennon.York@capitalone.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>  Following what Ted said, if you leverage the `mvn` from within t=
he
> >>>>>>>> `build/` directory of Spark you=C2=B9ll get zinc for free which =
should
> >>>>>>>>
> >>>>>>> help
> >>>>>
> >>>>>> speed up build times.
> >>>>>>>>
> >>>>>>>> On 5/1/15, 9:45 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>  Pramod:
> >>>>>>>>> Please remember to run Zinc so that the build is faster.
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> On Fri, May 1, 2015 at 9:36 AM, Ulanov, Alexander
> >>>>>>>>> <alexander.ulanov@hp.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>  Hi Pramod,
> >>>>>>>>>>
> >>>>>>>>>> For cluster-like tests you might want to use the same code as =
in
> >>>>>>>>>>
> >>>>>>>>> mllib's
> >>>>>>>
> >>>>>>>> LocalClusterSparkContext. You can rebuild only the package that
> >>>>>>>>>>
> >>>>>>>>> you
> >>>>>
> >>>>>> change
> >>>>>>>>>> and then run this main class.
> >>>>>>>>>>
> >>>>>>>>>> Best regards, Alexander
> >>>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Pramod Biligiri [mailto:pramodbiligiri@gmail.com]
> >>>>>>>>>> Sent: Friday, May 01, 2015 1:46 AM
> >>>>>>>>>> To: dev@spark.apache.org
> >>>>>>>>>> Subject: Speeding up Spark build during development
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>> I'm making some small changes to the Spark codebase and trying
> >>>>>>>>>>
> >>>>>>>>> it out
> >>>>>
> >>>>>> on a
> >>>>>>>>>> cluster. I was wondering if there's a faster way to build than
> >>>>>>>>>>
> >>>>>>>>> running
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>>> package target each time.
> >>>>>>>>>> Currently I'm using: mvn -DskipTests  package
> >>>>>>>>>>
> >>>>>>>>>> All the nodes have the same filesystem mounted at the same mou=
nt
> >>>>>>>>>>
> >>>>>>>>> point.
> >>>>>>>
> >>>>>>>> Pramod
> >>>>>>>>>>
> >>>>>>>>>>  ________________________________________________________
> >>>>>>>>
> >>>>>>>> The information contained in this e-mail is confidential and/or
> >>>>>>>> proprietary to Capital One and/or its affiliates. The informatio=
n
> >>>>>>>> transmitted herewith is intended only for use by the individual =
or
> >>>>>>>>
> >>>>>>> entity
> >>>>>>>
> >>>>>>>> to which it is addressed.  If the reader of this message is not
> the
> >>>>>>>> intended recipient, you are hereby notified that any review,
> >>>>>>>> retransmission, dissemination, distribution, copying or other us=
e
> >>>>>>>>
> >>>>>>> of, or
> >>>>>
> >>>>>> taking of any action in reliance upon this information is strictly
> >>>>>>>> prohibited. If you have received this communication in error,
> please
> >>>>>>>> contact the sender and delete the material from your computer.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>> --
> >>>> Emre Sevinc
> >>>>
> >>>>
> >>>
> >>
> >
>

--047d7ba97a7012724505154f0c0a--