Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57DF5186E7 for ; Tue, 25 Aug 2015 07:38:21 +0000 (UTC) Received: (qmail 37174 invoked by uid 500); 25 Aug 2015 07:38:17 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 37090 invoked by uid 500); 25 Aug 2015 07:38:17 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 37080 invoked by uid 99); 25 Aug 2015 07:38:17 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2015 07:38:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1815EC009B for ; Tue, 25 Aug 2015 07:38:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.979 X-Spam-Level: ** X-Spam-Status: No, score=2.979 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Ma46QGI37KbF for ; Tue, 25 Aug 2015 07:38:15 +0000 (UTC) Received: from mail-lb0-f173.google.com (mail-lb0-f173.google.com [209.85.217.173]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id EABEA42B16 for ; Tue, 25 Aug 2015 07:38:14 +0000 (UTC) Received: by lbbtg9 with SMTP id tg9so94758960lbb.1 for ; Tue, 25 Aug 2015 00:38:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=l611q5SUpLh9h0g9OXMBX7ucPwttgeALQz8yEwHOIU4=; b=C7NM5PIFtCHH7E2XQp5bqcs69TPv2MM6VNQ0iQKjche5T83DNwcXK/9+nS4Eaa6dBR gYFIPzXv9EXZoPjNA0oNFBOOI9sXSP6IPI2/Su+bWLuR8dIwtfJDfu0SQZ9zhjUmUjZd d6O7UTSJskI8qJJQTVTnY2rMklYektvIGEf3kSItZHtGDBo38o4v06rtI19k7dBCVhCW tRgQvZio6pLFsxbYWqDJllotI9c4OtWVlWTO/nc98Dm9U5QToHd/HGTF6rXfyJDUXH8N JKHY556RU8MPOZeo1K+TnOfwkHCuilekGeMckyrNtAz6MaAoXwBJorPzAXWTQEaHL09V qMeA== X-Gm-Message-State: ALoCoQm89BUIq+a5zeNqihemZ/jjRfBrJqZFtcecYuGXmJ1OantJaoT9YgIr1uocisDdudS52gR1 X-Received: by 10.152.37.2 with SMTP id u2mr24336621laj.70.1440488293723; Tue, 25 Aug 2015 00:38:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.167.193 with HTTP; Tue, 25 Aug 2015 00:37:54 -0700 (PDT) In-Reply-To: <1917fa4f.21d2.14f626b1fbe.Coremail.bit1129@163.com> References: <1917fa4f.21d2.14f626b1fbe.Coremail.bit1129@163.com> From: Michael Armbrust Date: Tue, 25 Aug 2015 00:37:54 -0700 Message-ID: Subject: Re: What does Attribute and AttributeReference mean in Spark SQL To: Todd Cc: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=089e0158b5e20d0fc3051e1dd18c --089e0158b5e20d0fc3051e1dd18c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Attribute is the Catalyst name for an input column from a child operator. An AttributeReference has been resolved, meaning we know which input column in particular it is referring too. An AttributeReference also has a known DataType. In contrast, before analysis there might still exist UnresolvedReferences, which are just string identifiers from a parsed query= . An Expression can be more complex (like you suggested, a + b), though technically just a is also a very simple Expression. The following console session shows how these types are composed: $ build/sbt sql/console import org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContextimport org.apache.spark.sql.catalyst.analysis._import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.dsl.expressions._import org.apache.spark.sql.catalyst.dsl.plans._ sc: org.apache.spark.SparkContext =3D org.apache.spark.SparkContext@5adfe37= d sqlContext: org.apache.spark.sql.SQLContext =3D org.apache.spark.sql.SQLContext@20d05227import sqlContext.implicits._import sqlContext._Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45).Type in expressions to have them evaluated.Type :help for more information. scala> val unresolvedAttr: UnresolvedAttribute =3D 'a unresolvedAttr: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = =3D 'a scala> val relation =3D LocalRelation('a.int) relation: org.apache.spark.sql.catalyst.plans.logical.LocalRelation =3D LocalRelation [a#0] scala> val parsedQuery =3D relation.select(unresolvedAttr) parsedQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =3D 'Project ['a] LocalRelation [a#0] scala> parsedQuery.analyze res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =3D Project = [a#0] LocalRelation [a#0] The #0 after a is a unique identifier (within this JVM) that says where the data is coming from, even as plans are rearranged due to optimizations. On Mon, Aug 24, 2015 at 6:13 PM, Todd wrote: > There are many such kind of case class or concept such as > Attribute/AttributeReference/Expression in Spark SQL > > I would ask what Attribute/AttributeReference/Expression mean=EF=BC=8C gi= ven a sql > query like select a,b from c, it a, b are two Attributes? a + b is an > expression? > Looks I misunderstand it because Attribute is extending Expression in the > code,which means Attribute itself is an Expression. > > > Thanks. > --089e0158b5e20d0fc3051e1dd18c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Attribute is th= e Catalyst name for an input column from a child operator.=C2=A0 An AttributeReference has been resolved, m= eaning we know which input column in particular it is referring too.=C2=A0 = An AttributeReference also has a known = DataType.=C2=A0 In contrast, before analysis there might still exist= UnresolvedReferences, which are= just string identifiers from a parsed query.

An Expression can be more complex (like yo= u suggested, =C2=A0a + b), thoug= h technically just a is also a v= ery simple Expression.=C2=A0 The= following console session shows how these types are composed:
$ build/sbt sql/=
console

import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.catalyst.analysi=
s._
import org.apache.spark.sql.catalyst.plans.l=
ogical._

import org.apache.spark.sql.catalyst.dsl.exp=
ressions._
import org.apache.spark.sql.catalyst.dsl.pla=
ns._

sc: org.apache.spark=
.SparkContext =3D org.apache.spark.SparkContext@5adfe37d
sqlContext: org.apac=
he.spark.sql.SQLContext =3D org.apache.=
spark.sql.SQLContext@20d05227
import sqlContext.implicits._
import sqlContext._
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server =
VM, Java 1.7.0_45).
Type in expressions=
 to have them evaluated.
Type :help for more information.

scala> val unresolvedAttr: =
UnresolvedAttribute =
=3D 'a
unresolvedAttr: org.=
apache.spark.sql.catalyst.analysis.UnresolvedAttribute =3D 'a=


scala> val relation =3D Loc=
alRelation('a.int)
relation: org.apache=
.spark.sql.catalyst.plans.logical.LocalRelation =3D=20
LocalRelation [a#0]

scala> val parsedQuery =3D relation.select(unresolvedAttr)
parsedQuery: org.apa=
che.spark.sql.catalyst.plans.logical.LogicalPlan =3D=20
'Project ['a]
 LocalRelation [a#<=
span class=3D"" style=3D"color:rgb(0,134,179)">0]

scala> parsedQuer=
y.analyze
res11: org.apache.sp=
ark.sql.catalyst.plans.logical.LogicalPlan =3D=
=20
Project [a#0]
 LocalRelation [a#<=
span class=3D"" style=3D"color:rgb(0,134,179)">0]

The=C2=A0#0=C2=A0after=C2=A0a=C2=A0is a unique ide= ntifier (within this JVM) that says where the data is coming from, even as = plans are rearranged due to optimizations.


On Mon, Aug 24, 2015 at 6:13 PM, T= odd <bit1129@163.com> wrote:
There are many such kind of case class or concept such as Attribut= e/AttributeReference/Expression in Spark SQL

I would ask what Attrib= ute/AttributeReference/Expression mean=EF=BC=8C given a sql query like sele= ct a,b from c, it a,=C2=A0 b are two Attributes? a + b is an expression?Looks I misunderstand it because Attribute is extending Expression in the = code,which means Attribute itself is an Expression.


Thanks.
<= /div>

--089e0158b5e20d0fc3051e1dd18c--