spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Dautkhanov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-23554) Hive's textinputformat.record.delimiter equivalent in Spark
Date Thu, 01 Mar 2018 20:58:00 GMT
Ruslan Dautkhanov created SPARK-23554:
-----------------------------------------

             Summary: Hive's textinputformat.record.delimiter equivalent in Spark
                 Key: SPARK-23554
                 URL: https://issues.apache.org/jira/browse/SPARK-23554
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 2.3.0, 2.2.1
            Reporter: Ruslan Dautkhanov


It would be great if Spark would support an option similar to Hive's {{textinputformat.record.delimiter }}
in spark-csv reader.

We currently have to create Hive tables to workaround this missing functionality natively
in Spark.

{{textinputformat.record.delimiter}} was introduced back in 2011 in map-reduce era -
 see MAPREDUCE-2254.

As an example, one of the most common use cases for us involving {{textinputformat.record.delimiter}}
is to read multiple lines of text that make up a "record". Number of actual lines per "record"
is varying and so {{textinputformat.record.delimiter}} is a great solution for us to process
these files natively in Hadoop/Spark (custom .map() function then actually does processing
of those records), and we convert it to a dataframe.. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message