spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Gekk (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-24329) Remove comments filtering before parsing of CSV files
Date Mon, 21 May 2018 08:18:00 GMT
Maxim Gekk created SPARK-24329:
----------------------------------

             Summary: Remove comments filtering before parsing of CSV files
                 Key: SPARK-24329
                 URL: https://issues.apache.org/jira/browse/SPARK-24329
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Maxim Gekk


Comments and whitespace filtering has been performed by uniVocity parser already according
to parser settings:
https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L178-L180

It is not necessary to do the same before parsing. Need to inspect all places where the filterCommentAndEmpty
method is called, and remove the former one if it duplicates filtering of uniVocity parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message