spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26297) improve the doc of Distribution/Partitioning
Date Mon, 10 Dec 2018 19:28:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715425#comment-16715425
] 

ASF GitHub Bot commented on SPARK-26297:
----------------------------------------

gatorsmile commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc
of Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#discussion_r240347775
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ##########
 @@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, IntegerType}
 
 /**
  * Specifies how tuples that share common expressions will be distributed when a query is
executed
- * in parallel on many machines.  Distribution can be used to refer to two distinct physical
- * properties:
- *  - Inter-node partitioning of data: In this case the distribution describes how tuples
are
- *    partitioned across physical machines in a cluster.  Knowing this property allows some
- *    operators (e.g., Aggregate) to perform partition local operations instead of global
ones.
- *  - Intra-partition ordering of data: In this case the distribution describes guarantees
made
- *    about how tuples are distributed within a single partition.
+ * in parallel on many machines.
+ *
+ * Distribution here refers to inter-node partitioning of data:
+ *   - The distribution describes how tuples are partitioned across physical machines in
a cluster.
+ *     Knowing this property allows some operators (e.g., Aggregate) to perform partition
local
+ *     operations instead of global ones.
 
 Review comment:
   How about?
   
   > Distribution here refers to inter-node partitioning of data. That is, it describes
how tuples are partitioned across physical machines in a cluster. Knowing this property allows
some operators (e.g., Aggregate) to perform partition local operations instead of global ones.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> improve the doc of Distribution/Partitioning
> --------------------------------------------
>
>                 Key: SPARK-26297
>                 URL: https://issues.apache.org/jira/browse/SPARK-26297
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message