Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E6EFB17D2F for ; Thu, 8 Oct 2015 01:48:17 +0000 (UTC) Received: (qmail 69818 invoked by uid 500); 8 Oct 2015 01:48:13 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 69724 invoked by uid 500); 8 Oct 2015 01:48:13 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 69714 invoked by uid 99); 8 Oct 2015 01:48:13 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Oct 2015 01:48:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 077FBC028F for ; Thu, 8 Oct 2015 01:48:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.998 X-Spam-Level: ** X-Spam-Status: No, score=2.998 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id opDmbKRV33qs for ; Thu, 8 Oct 2015 01:48:12 +0000 (UTC) Received: from mail-lb0-f175.google.com (mail-lb0-f175.google.com [209.85.217.175]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 763A4439BC for ; Thu, 8 Oct 2015 01:48:11 +0000 (UTC) Received: by lbos8 with SMTP id s8so30536372lbo.0 for ; Wed, 07 Oct 2015 18:48:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=P4NckIPPjuYygEuB7TaJVn7BN5LFe7+hGQw6WUaCknM=; b=N6iMmtBbjzVX0lv4V0pSwURfdGg7wtTWJoD1gcKCBKHAXGbVPsDCVkj9jM0jYIp8Z7 s9icvitA6NIayk6QWX40twiGGYkbVLWV0h4wO19hKqOesNpr6NbEr2e4Hm8hDJmUindK ru5rvb20s3mCa+G+eSolfoHD4TZpc8P70GG/6dIIkTgVXLGdQ0VXVUFFc9g3PNr5lu4G hqYahUoyy1Rf+fan2U016kCsD0rfrc1mZmsbLWIs4b4BjJ5nHjZsO20G0hRpmgMiX+J0 cH+DVzeTDnex0VY2eAer0+rv32Q6qQ/1f1mRwSVEYu28ETDWhPY3POyvmGc3SVny7kLS gViQ== X-Gm-Message-State: ALoCoQkdXzs6A9u757zs6PzODLmdDEHzFA4Hftumc0eJmVOVVbFZoIYkspJVpDN6wqXPW3c1zQ4d X-Received: by 10.25.162.21 with SMTP id l21mr1475263lfe.70.1444268889805; Wed, 07 Oct 2015 18:48:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.139.67 with HTTP; Wed, 7 Oct 2015 18:47:50 -0700 (PDT) In-Reply-To: References: From: Michael Armbrust Date: Wed, 7 Oct 2015 18:47:50 -0700 Message-ID: Subject: Re: SparkSQL: First query execution is always slower than subsequent queries To: Lloyd Haris Cc: user Content-Type: multipart/alternative; boundary=001a113dbf4c232ed505218e0e24 --001a113dbf4c232ed505218e0e24 Content-Type: text/plain; charset=UTF-8 -dev +user 1). Is that the reason why it's always slow in the first run? Or are there > any other reasons? Apparently it loads data to memory every time so it > shouldn't be something to do with disk read should it? > You are probably seeing the effect of the JVMs JIT. The first run is executing in interpreted mode. Once the JVM sees its a hot piece of code it will compile it to native code. This applies both to Spark / Spark SQL itself and (as of Spark 1.5) the code that we dynamically generate for doing expression evaluation. Multiple runs with the same expressions will used cached code that might have been JITed. > 2). Does Spark use the Hadoop's Map Reduce engine under the hood? If so > can we configure it to use MR2 instead of MR1. > No, we do not use the map reduce engine for execution. You can however compile Spark to work with either version of hadoop for so you can access HDFS, etc. --001a113dbf4c232ed505218e0e24 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
-dev +user

=
1). Is that the reason why it's always slow in the first run?= Or are there any other reasons? Apparently it loads data to memory every t= ime so it shouldn't be something to do with disk read should it?

You are probabl= y seeing the effect of the JVMs JIT.=C2=A0 The first run is executing in in= terpreted mode.=C2=A0 Once the JVM sees its a hot piece of code it will com= pile it to native code.=C2=A0 This applies both to Spark / Spark SQL itself= and (as of Spark 1.5) the code that we dynamically generate for doing expr= ession evaluation.=C2=A0 Multiple runs with the same expressions will used = cached code that might have been JITed.
=C2=A0
2). Does Spark use t= he Hadoop's Map Reduce engine under the hood? If so can we configure it= to use MR2 instead of MR1.=C2=A0

No, we do not use the map reduce engine for execut= ion.=C2=A0 You can however compile Spark to work with either version of had= oop for so you can access HDFS, etc.
--001a113dbf4c232ed505218e0e24--