Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CAO1P4pqgGG3DmaJT9akz+L1hSkGsPnxV2_eyrdAJ+ZXQQMzTJw@mail.gmail.com>
References: 
 <CAO1P4pqgGG3DmaJT9akz+L1hSkGsPnxV2_eyrdAJ+ZXQQMzTJw@mail.gmail.com>
From: Michael Armbrust <michael@databricks.com>
Date: Wed, 7 Oct 2015 18:47:50 -0700
Message-ID: 
 <CAAswR-7h3J-Bvd_bnfuZa_jPmPJ9tJVfpSvJRrJyE_bH1kqivA@mail.gmail.com>
Subject: Re: SparkSQL: First query execution is always slower than subsequent
 queries
To: Lloyd Haris <lloydharis@gmail.com>
Cc: user <user@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a113dbf4c232ed505218e0e24

--001a113dbf4c232ed505218e0e24
Content-Type: text/plain; charset=UTF-8

-dev +user

1). Is that the reason why it's always slow in the first run? Or are there
> any other reasons? Apparently it loads data to memory every time so it
> shouldn't be something to do with disk read should it?
>

You are probably seeing the effect of the JVMs JIT.  The first run is
executing in interpreted mode.  Once the JVM sees its a hot piece of code
it will compile it to native code.  This applies both to Spark / Spark SQL
itself and (as of Spark 1.5) the code that we dynamically generate for
doing expression evaluation.  Multiple runs with the same expressions will
used cached code that might have been JITed.


> 2). Does Spark use the Hadoop's Map Reduce engine under the hood? If so
> can we configure it to use MR2 instead of MR1.
>

No, we do not use the map reduce engine for execution.  You can however
compile Spark to work with either version of hadoop for so you can access
HDFS, etc.

--001a113dbf4c232ed505218e0e24
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">-dev +user<div><br><div class=3D"gmail_extra"><div class=
=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>=
<div><div>1). Is that the reason why it&#39;s always slow in the first run?=
 Or are there any other reasons? Apparently it loads data to memory every t=
ime so it shouldn&#39;t be something to do with disk read should it?<br></d=
iv></div></div></div></div></blockquote><div><br></div><div>You are probabl=
y seeing the effect of the JVMs JIT.=C2=A0 The first run is executing in in=
terpreted mode.=C2=A0 Once the JVM sees its a hot piece of code it will com=
pile it to native code.=C2=A0 This applies both to Spark / Spark SQL itself=
 and (as of Spark 1.5) the code that we dynamically generate for doing expr=
ession evaluation.=C2=A0 Multiple runs with the same expressions will used =
cached code that might have been JITed.</div><div>=C2=A0</div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div dir=3D"ltr"><div><div><div><div>2). Does Spark use t=
he Hadoop&#39;s Map Reduce engine under the hood? If so can we configure it=
 to use MR2 instead of MR1.=C2=A0<br></div></div></div></div></div></blockq=
uote><div><br></div><div>No, we do not use the map reduce engine for execut=
ion.=C2=A0 You can however compile Spark to work with either version of had=
oop for so you can access HDFS, etc.</div></div></div></div></div>

--001a113dbf4c232ed505218e0e24--