spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10422) String column in InMemoryColumnarCache needs to override clone method
Date Wed, 02 Sep 2015 19:45:45 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727905#comment-14727905
] 

Apache Spark commented on SPARK-10422:
--------------------------------------

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/8578

> String column in InMemoryColumnarCache needs to override clone method
> ---------------------------------------------------------------------
>
>                 Key: SPARK-10422
>                 URL: https://issues.apache.org/jira/browse/SPARK-10422
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Yin Huai
>
> We have a clone method in {{ColumnType}} (https://github.com/apache/spark/blob/v1.5.0-rc3/sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala#L103).
Seems we need to override it for String (https://github.com/apache/spark/blob/v1.5.0-rc3/sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala#L314)
because we are dealing with UTF8String.
> {code}
> val df =
>   ctx.range(1, 30000).selectExpr("id % 500 as id").rdd.map(id => Tuple1(s"str_$id")).toDF("i")
> val cached = df.cache()
> cached.count()
> [info] - SPARK-10422: String column in InMemoryColumnarCache needs to override clone
method *** FAILED *** (9 seconds, 152 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in
stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost):
java.util.NoSuchElementException: key not found: str_[0]
> [info] 	at scala.collection.MapLike$class.default(MapLike.scala:228)
> [info] 	at scala.collection.AbstractMap.default(Map.scala:58)
> [info] 	at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
> [info] 	at org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258)
> [info] 	at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110)
> [info] 	at org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87)
> [info] 	at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
> [info] 	at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152)
> [info] 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> [info] 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> [info] 	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> [info] 	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> [info] 	at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> [info] 	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> [info] 	at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152)
> [info] 	at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message