spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <>
Subject Re: SparkSQL: First query execution is always slower than subsequent queries
Date Thu, 08 Oct 2015 01:47:50 GMT
-dev +user

1). Is that the reason why it's always slow in the first run? Or are there
> any other reasons? Apparently it loads data to memory every time so it
> shouldn't be something to do with disk read should it?

You are probably seeing the effect of the JVMs JIT.  The first run is
executing in interpreted mode.  Once the JVM sees its a hot piece of code
it will compile it to native code.  This applies both to Spark / Spark SQL
itself and (as of Spark 1.5) the code that we dynamically generate for
doing expression evaluation.  Multiple runs with the same expressions will
used cached code that might have been JITed.

> 2). Does Spark use the Hadoop's Map Reduce engine under the hood? If so
> can we configure it to use MR2 instead of MR1.

No, we do not use the map reduce engine for execution.  You can however
compile Spark to work with either version of hadoop for so you can access
HDFS, etc.

View raw message