Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AB3E18292 for ; Wed, 20 Jan 2016 22:59:40 +0000 (UTC) Received: (qmail 48001 invoked by uid 500); 20 Jan 2016 22:59:40 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 47958 invoked by uid 500); 20 Jan 2016 22:59:40 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 47889 invoked by uid 99); 20 Jan 2016 22:59:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 22:59:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id F050F2C1F57 for ; Wed, 20 Jan 2016 22:59:39 +0000 (UTC) Date: Wed, 20 Jan 2016 22:59:39 +0000 (UTC) From: "Reynold Xin (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (SPARK-12616) Union logical plan should support arbitrary number of children (rather than binary) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-12616. --------------------------------- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 > Union logical plan should support arbitrary number of children (rather than binary) > ----------------------------------------------------------------------------------- > > Key: SPARK-12616 > URL: https://issues.apache.org/jira/browse/SPARK-12616 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Reynold Xin > Assignee: Xiao Li > Fix For: 2.0.0 > > > Union logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer (or analyzer?) to collapse all adjacent Unions into one. > Note that this problem doesn't exist in physical plan, because the physical Union already supports arbitrary number of children. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org