spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject spark git commit: [SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions
Date Tue, 14 Jun 2016 05:02:26 GMT
Repository: spark
Updated Branches:
  refs/heads/master 1842cdd4e -> 688b6ef9d


[SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions

## What changes were proposed in this pull request?

In our encoder framework, we imply that serializer expressions should use `BoundReference`
to refer to the input object, and a lot of codes depend on this contract(e.g. ExpressionEncoder.tuple).
 This PR adds some document and assert in `ExpressionEncoder` to make it clearer.

## How was this patch tested?

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes #13648 from cloud-fan/comment.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/688b6ef9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/688b6ef9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/688b6ef9

Branch: refs/heads/master
Commit: 688b6ef9dc0943d268fab7279ef50bfac1617f04
Parents: 1842cdd
Author: Wenchen Fan <wenchen@databricks.com>
Authored: Mon Jun 13 22:02:23 2016 -0700
Committer: Reynold Xin <rxin@databricks.com>
Committed: Mon Jun 13 22:02:23 2016 -0700

----------------------------------------------------------------------
 .../spark/sql/catalyst/encoders/ExpressionEncoder.scala     | 9 +++++++++
 1 file changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/688b6ef9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
----------------------------------------------------------------------
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
index 688082d..0023ce6 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
@@ -197,6 +197,15 @@ case class ExpressionEncoder[T](
 
   if (flat) require(serializer.size == 1)
 
+  // serializer expressions are used to encode an object to a row, while the object is usually
an
+  // intermediate value produced inside an operator, not from the output of the child operator.
This
+  // is quite different from normal expressions, and `AttributeReference` doesn't work here
+  // (intermediate value is not an attribute). We assume that all serializer expressions
use a same
+  // `BoundReference` to refer to the object, and throw exception if they don't.
+  assert(serializer.forall(_.references.isEmpty), "serializer cannot reference to any attributes.")
+  assert(serializer.flatMap(_.collect { case b: BoundReference => b}).distinct.length
<= 1,
+    "all serializer expressions must use the same BoundReference.")
+
   /**
    * Returns a new copy of this encoder, where the `deserializer` is resolved and bound to
the
    * given schema.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message