spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-12616) Union logical plan should support arbitrary number of children (rather than binary)
Date Wed, 20 Jan 2016 22:59:39 GMT

     [ https://issues.apache.org/jira/browse/SPARK-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin resolved SPARK-12616.
---------------------------------
       Resolution: Fixed
         Assignee: Xiao Li
    Fix Version/s: 2.0.0

> Union logical plan should support arbitrary number of children (rather than binary)
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-12616
>                 URL: https://issues.apache.org/jira/browse/SPARK-12616
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Xiao Li
>             Fix For: 2.0.0
>
>
> Union logical plan is a binary node. However, a typical use case for union is to union
a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union
hundreds of thousands of files. In this case, our optimizer can become very slow due to the
large number of logical unions. We should change the Union logical plan to support an arbitrary
number of children, and add a single rule in the optimizer (or analyzer?) to collapse all
adjacent Unions into one.
> Note that this problem doesn't exist in physical plan, because the physical Union already
supports arbitrary number of children.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message