Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9B365200C50 for ; Sat, 8 Apr 2017 21:34:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 99BA1160B93; Sat, 8 Apr 2017 19:34:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E2C15160B83 for ; Sat, 8 Apr 2017 21:34:57 +0200 (CEST) Received: (qmail 95429 invoked by uid 500); 8 Apr 2017 19:34:55 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 95419 invoked by uid 99); 8 Apr 2017 19:34:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Apr 2017 19:34:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6D8F3C075B for ; Sat, 8 Apr 2017 19:34:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.397 X-Spam-Level: X-Spam-Status: No, score=-2.397 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Ob3DNtMutxFP for ; Sat, 8 Apr 2017 19:34:53 +0000 (UTC) Received: from mail-wr0-f194.google.com (mail-wr0-f194.google.com [209.85.128.194]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C10F25FB40 for ; Sat, 8 Apr 2017 19:34:52 +0000 (UTC) Received: by mail-wr0-f194.google.com with SMTP id t20so24210109wra.2 for ; Sat, 08 Apr 2017 12:34:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=Iu0zgnXF/+i3iCMqnVu5ZSaHaUNSDdANI22sE3/fQ8w=; b=JTdafIt173RFij85Bhno82VRq9vdaKd2WCjYI75NrTavQwqIU66IsTRRPljewcmQeT xQ6XQgItmC+zxdGQJQqFfH7qTCOvePDg+1sbG9iME7CjDLGSFoEA1LWFxT7Un3Bf/K+T KlTNI9NCK4t+7vTP72q4KrVi1pcL0BnUh0APE2XDvZpKm2P83xgwchoFoKThKbcYsVGj oybEnEjpCQrBAjpn+ob/O7k0kH2SsTUWSB+EeIDlfRcGJeX5vURXcpNi4D1sMGy2V+YQ iqkXG+CJKyri5sz3f2eE21BX/A9FF+6OGORn2ywhZBoXLgVuCMgOLsjPv7UUGuonLf5W YY8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=Iu0zgnXF/+i3iCMqnVu5ZSaHaUNSDdANI22sE3/fQ8w=; b=TyR1NAiQTiRSBtFNWoCzxHvsJzfT3+ck4M86orqcPoZiKbzH4KQKZs35m4jA07Qh0S tlGxoW+s4LM8n3tulSaNuysFAHerE3n4zy5h6IkCI2o1GHljxPYT1goYhJX6LH/ECP1Z I6IrZLWJ8JvHp1t9yL/FrRJ4RXifZi3Qx9ub30iijkAxE+q6066CQA5W+tzrftDQvxB9 lSF+/fmAl3gpojDkoMXix8h0PUj/B5unRp6hCv/fr7WXQ+1+uiAPVmaxptSVpRVY3NI4 TAlkh5STEWSoXNi1G2qosk8NXdmv6gr3ixxDrce2kQ/miNvk+0u2qSUdnmVqpDKpCaa3 YfFQ== X-Gm-Message-State: AN3rC/4IGifYJ3gzrQ/IODXpg2esfzIZd0Jz+AruyMBgJTeP2/DQzIWRGnAAOOLyIQlrzQ== X-Received: by 10.223.174.131 with SMTP id y3mr1820587wrc.196.1491680092429; Sat, 08 Apr 2017 12:34:52 -0700 (PDT) Received: from [10.56.71.163] (x590fefa3.dyn.telefonica.de. [89.15.239.163]) by smtp.gmail.com with ESMTPSA id u47sm6152890wrb.27.2017.04.08.12.34.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 Apr 2017 12:34:51 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: Why dataframe can be more efficient than dataset? From: =?utf-8?Q?J=C3=B6rn_Franke?= X-Mailer: iPhone Mail (14E304) In-Reply-To: Date: Sat, 8 Apr 2017 21:34:50 +0200 Cc: user@spark.apache.org Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Shiyuan archived-at: Sat, 08 Apr 2017 19:34:58 -0000 As far as I am aware in newer Spark versions a DataFrame is the same as Data= set[Row]. In fact, performance depends on so many factors, so I am not sure such a com= parison makes sense. > On 8. Apr 2017, at 20:15, Shiyuan wrote: >=20 > Hi Spark-users,=20 > I came across a few sources which mentioned DataFrame can be more effi= cient than Dataset. I can understand this is true because Dataset allows fu= nctional transformation which Catalyst cannot look into and hence cannot opt= imize well. But can DataFrame be more efficient than Dataset even if we only= use the relational transformation on dataset? If so, can anyone give some e= xplanation why it is so? Any benchmark comparing dataset vs. dataframe? T= hank you! >=20 > Shiyuan=20 --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscribe@spark.apache.org