spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burak Yavuz <>
Subject Re: CSV Support in SparkR
Date Tue, 02 Jun 2015 19:03:11 GMT

cc'ing Shivaram here, because he worked on this yesterday.

If I'm not mistaken, you can use the following workflow:
```./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3```

and then

```df <- read.df(sqlContext, "/data", "csv", header = "true")```


On Tue, Jun 2, 2015 at 11:52 AM, Eskilson,Aleksander <> wrote:

>  Are there any intentions to provide first class support for CSV files as
> one of the loadable file types in SparkR? Data brick’s spark-csv API [1]
> has support for SQL, Python, and Java/Scala, and implements most of the
> arguments of R’s read.table API [2], but currently there is no way to load
> CSV data in SparkR (1.4.0) besides separating our headers from the data,
> loading into an RDD, splitting by our delimiter, and then converting to a
> SparkR Data Frame with a vector of the columns gathered from the header.
>  Regards,
>  Alek Eskilson
>  [1] --
> [2] --
>  CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

View raw message