spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Assigned] (SPARK-15982) Harmonize the behavior of DataFrameReader.text/csv/json/parquet/orc
Date Fri, 17 Jun 2016 03:41:05 GMT


Apache Spark reassigned SPARK-15982:

    Assignee: Apache Spark  (was: Tathagata Das)

> Harmonize the behavior of DataFrameReader.text/csv/json/parquet/orc
> -------------------------------------------------------------------
>                 Key: SPARK-15982
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Tathagata Das
>            Assignee: Apache Spark
> Issues with current reader behavior. 
> - `text()` without args returns an empty DF with no columns -> inconsistent, its expected
that text will always return a DF with `value` string field,
> - `textFile()` without args fails with exception because of the above reason, it expected
the DF returned by `text()` to have a `value` field.
> - `orc()` does not have var args, inconsistent with others
> - `json(single-arg)` was removed, but that caused source compatibility issues - SPARK-16009
> The solution I am implementing is to do the following. 
> 1. For each format, there will be a single argument method, and a vararg method. For
json, parquet, csv, text, this means adding json(string), etc.. For orc, this means adding
> 2. Remove the special handling of text(), csv(), etc. that returns empty dataframe with
no fields. Rather pass on the empty sequence of paths to the datasource, and let each datasource
handle it right. For e.g, text data source, should return empty DF with schema (value: string)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message