spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Dautkhanov (JIRA)" <>
Subject [jira] [Created] (SPARK-23554) Hive's textinputformat.record.delimiter equivalent in Spark
Date Thu, 01 Mar 2018 20:58:00 GMT
Ruslan Dautkhanov created SPARK-23554:

             Summary: Hive's textinputformat.record.delimiter equivalent in Spark
                 Key: SPARK-23554
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 2.3.0, 2.2.1
            Reporter: Ruslan Dautkhanov

It would be great if Spark would support an option similar to Hive's {{textinputformat.record.delimiter }}
in spark-csv reader.

We currently have to create Hive tables to workaround this missing functionality natively
in Spark.

{{textinputformat.record.delimiter}} was introduced back in 2011 in map-reduce era -
 see MAPREDUCE-2254.

As an example, one of the most common use cases for us involving {{textinputformat.record.delimiter}}
is to read multiple lines of text that make up a "record". Number of actual lines per "record"
is varying and so {{textinputformat.record.delimiter}} is a great solution for us to process
these files natively in Hadoop/Spark (custom .map() function then actually does processing
of those records), and we convert it to a dataframe.. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message