spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21024) CSV parse mode handles Univocity parser exceptions
Date Fri, 09 Jun 2017 07:00:28 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044052#comment-16044052
] 

Takeshi Yamamuro commented on SPARK-21024:
------------------------------------------

Thanks! I'll open a new pr later.

> CSV parse mode handles Univocity parser exceptions
> --------------------------------------------------
>
>                 Key: SPARK-21024
>                 URL: https://issues.apache.org/jira/browse/SPARK-21024
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.1
>            Reporter: Takeshi Yamamuro
>            Priority: Minor
>
> The current master cannot skip the illegal records that Univocity parsers:
> This comes from the spark-user mailing list:
> https://www.mail-archive.com/user@spark.apache.org/msg63985.html
> {code}
> scala> Seq("0,1", "0,1,2,3").toDF().write.text("/Users/maropu/Desktop/data")
> scala> val df = spark.read.format("csv").schema("a int, b int").option("maxColumns",
"3").load("/Users/maropu/Desktop/data")
> scala> df.show
> com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException
- 3
> Hint: Number of columns processed may have exceeded limit of 3 columns. Use settings.setMaxColumns(int)
to define the maximum number of columns your input can have
> Ensure your configuration is correct, with delimiters, quotes and escape sequences that
match the input format you are trying to parse
> Parser Configuration: CsvParserSettings:
>         Auto configuration enabled=true
>         Autodetect column delimiter=false
>         Autodetect quotes=false
>         Column reordering enabled=true
>         Empty value=null
>         Escape unquoted values=false
>         ...
> at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339)
> at com.univocity.parsers.common.AbstractParser.handleEOF(AbstractParser.java:195)
> at com.univocity.parsers.common.AbstractParser.parseLine(AbstractParser.java:544)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:191)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308)
> at org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:60)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> ...
> {code}
> We could easily fix this like: https://github.com/apache/spark/compare/master...maropu:HandleExceptionInParser



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message