Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DB7D6200C11 for ; Sat, 21 Jan 2017 00:45:38 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DA1D9160B58; Fri, 20 Jan 2017 23:45:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 06BEF160B48 for ; Sat, 21 Jan 2017 00:45:37 +0100 (CET) Received: (qmail 19347 invoked by uid 500); 20 Jan 2017 23:45:36 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 19338 invoked by uid 99); 20 Jan 2017 23:45:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jan 2017 23:45:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 35B471A00D1 for ; Fri, 20 Jan 2017 23:45:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zBakqdC1xdHz for ; Fri, 20 Jan 2017 23:45:34 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id BF9945F30B for ; Fri, 20 Jan 2017 23:45:33 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id DB008E0059 for ; Fri, 20 Jan 2017 23:45:26 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 93FD025284 for ; Fri, 20 Jan 2017 23:45:26 +0000 (UTC) Date: Fri, 20 Jan 2017 23:45:26 +0000 (UTC) From: "Barry Becker (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (SPARK-19317) UnsupportedOperationException: empty.reduceLeft in LinearSeqOptimized MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 20 Jan 2017 23:45:39 -0000 Barry Becker created SPARK-19317: ------------------------------------ Summary: UnsupportedOperationException: empty.reduceLeft in Li= nearSeqOptimized Key: SPARK-19317 URL: https://issues.apache.org/jira/browse/SPARK-19317 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: Barry Becker I wish I had more of a simple reproducible case to give, but I got the belo= w exception while selecting null values in one of the columns of a datafram= e. My client code that failed was=20 df.filter(filterExp).count() where the filter expression was something like someColumn.isNull. There were 412 nulls out of 716,000 total rows for the column being filtere= d. Its odd because I have a different, smaller dataset where I did the same th= ing on a column with 100 nulls out of 800 and did not get the error. The exception seems to indicate that spark is trying to do reduceLeft on an= empy list. {code} java.lang.UnsupportedOperationException: empty.reduceLeftscala.collection.L= inearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:137) scala.coll= ection.immutable.List.reduceLeft(List.scala:84) scala.collection.Traversabl= eOnce$class.reduce(TraversableOnce.scala:208) scala.collection.AbstractTrav= ersable.reduce(Traversable.scala:104) org.apache.spark.sql.execution.column= ar.InMemoryTableScanExec$$anonfun$1.applyOrElse(InMemoryTableScanExec.scala= :90) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun= $1.applyOrElse(InMemoryTableScanExec.scala:54) scala.runtime.AbstractPartia= lFunction.apply(AbstractPartialFunction.scala:36) org.apache.spark.sql.exec= ution.columnar.InMemoryTableScanExec$$anonfun$1.applyOrElse(InMemoryTableSc= anExec.scala:61) org.apache.spark.sql.execution.columnar.InMemoryTableScanE= xec$$anonfun$1.applyOrElse(InMemoryTableScanExec.scala:54) scala.PartialFun= ction$Lifted.apply(PartialFunction.scala:223) scala.PartialFunction$Lifted.= apply(PartialFunction.scala:219) org.apache.spark.sql.execution.columnar.In= MemoryTableScanExec$$anonfun$2.apply(InMemoryTableScanExec.scala:95) org.ap= ache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$2.apply(In= MemoryTableScanExec.scala:94) scala.collection.TraversableLike$$anonfun$fla= tMap$1.apply(TraversableLike.scala:241) scala.collection.TraversableLike$$a= nonfun$flatMap$1.apply(TraversableLike.scala:241) scala.collection.immutabl= e.List.foreach(List.scala:381) scala.collection.TraversableLike$class.flatM= ap(TraversableLike.scala:241) scala.collection.immutable.List.flatMap(List.= scala:344) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec.(I= nMemoryTableScanExec.scala:94) org.apache.spark.sql.execution.SparkStrategi= es$InMemoryScans$$anonfun$6.apply(SparkStrategies.scala:306) org.apache.spa= rk.sql.execution.SparkStrategies$InMemoryScans$$anonfun$6.apply(SparkStrate= gies.scala:306) org.apache.spark.sql.execution.SparkPlanner.pruneFilterProj= ect(SparkPlanner.scala:96) org.apache.spark.sql.execution.SparkStrategies$I= nMemoryScans$.apply(SparkStrategies.scala:302) org.apache.spark.sql.catalys= t.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62) org.apache.= spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scal= a:62) scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) scala.= collection.Iterator$$anon$12.hasNext(Iterator.scala:440) scala.collection.I= terator$$anon$12.hasNext(Iterator.scala:439) org.apache.spark.sql.catalyst.= planning.QueryPlanner.plan(QueryPlanner.scala:92) org.apache.spark.sql.cata= lyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.s= cala:77) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$an= onfun$apply$2.apply(QueryPlanner.scala:74) scala.collection.TraversableOnce= $$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) scala.collection.Trav= ersableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) scala.coll= ection.Iterator$class.foreach(Iterator.scala:893) scala.collection.Abstract= Iterator.foreach(Iterator.scala:1336) scala.collection.TraversableOnce$clas= s.foldLeft(TraversableOnce.scala:157) scala.collection.AbstractIterator.fol= dLeft(Iterator.scala:1336) org.apache.spark.sql.catalyst.planning.QueryPlan= ner$$anonfun$2.apply(QueryPlanner.scala:74) org.apache.spark.sql.catalyst.p= lanning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66) scala.collecti= on.Iterator$$anon$12.nextCur(Iterator.scala:434) scala.collection.Iterator$= $anon$12.hasNext(Iterator.scala:440) org.apache.spark.sql.catalyst.planning= .QueryPlanner.plan(QueryPlanner.scala:92) org.apache.spark.sql.catalyst.pla= nning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)= org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$ap= ply$2.apply(QueryPlanner.scala:74) scala.collection.TraversableOnce$$anonfu= n$foldLeft$1.apply(TraversableOnce.scala:157) scala.collection.TraversableO= nce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) scala.collection.I= terator$class.foreach(Iterator.scala:893) scala.collection.AbstractIterator= .foreach(Iterator.scala:1336) scala.collection.TraversableOnce$class.foldLe= ft(TraversableOnce.scala:157) scala.collection.AbstractIterator.foldLeft(It= erator.scala:1336) org.apache.spark.sql.catalyst.planning.QueryPlanner$$ano= nfun$2.apply(QueryPlanner.scala:74) org.apache.spark.sql.catalyst.planning.= QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66) scala.collection.Itera= tor$$anon$12.nextCur(Iterator.scala:434) scala.collection.Iterator$$anon$12= .hasNext(Iterator.scala:440) org.apache.spark.sql.catalyst.planning.QueryPl= anner.plan(QueryPlanner.scala:92) org.apache.spark.sql.execution.QueryExecu= tion.sparkPlan$lzycompute(QueryExecution.scala:79) org.apache.spark.sql.exe= cution.QueryExecution.sparkPlan(QueryExecution.scala:75) org.apache.spark.s= ql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84= ) org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution= .scala:84) org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2774) or= g.apache.spark.sql.Dataset.count(Dataset.scala:2404) mypackage.Selection(Se= lection.scala:34)=20 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org