spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (Jira)" <>
Subject [jira] [Commented] (SPARK-36024) Switch the datasource example due to the depreciation of the dataset
Date Tue, 06 Jul 2021 09:33:00 GMT


Steve Loughran commented on SPARK-36024:

similar to HADOOP-17784

I'm "in discussions" with them. Maybe I can persuade them to leave the index file up

And I'd like to move on to a dataset where (a) it's stable (b) got real ORC/Parquet data alongside
the CSV

Finally: need to make sure that this time, not matter how "stable" the source is, whoever
runs it knows we need it.

Where in the docs is this?

> Switch the datasource example due to the depreciation of the dataset
> --------------------------------------------------------------------
>                 Key: SPARK-36024
>                 URL:
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 3.1.2
>            Reporter: Leona Yoda
>            Priority: Trivial
> The S3 bucket that used for an example in "Integration with Cloud Infrastructures" document
will be deleted on Jul 1, 2021 [ |]
> The dataset will move to another bucket but it requires `--request-payer requester` option
so users have to pay S3 cost. []
> So I think it's better to change the datasource like this.
> []
> I chose [NYC Taxi data| [|,]]
here for an example. 
>  Unlike landat data it's not compressed, but it is just an example and there are several
tutorials using Spark  (e.g. []
> Reed test result
> {code:java}
> scala> sc.textFile("s3a://nyc-tlc/misc/taxi _zone_lookup.csv").take(10).foreach(println)
"LocationID","Borough","Zone","service_zone" 1,"EWR","Newark Airport","EWR" 2,"Queens","Jamaica
Bay","Boro Zone" 3,"Bronx","Allerton/Pelham Gardens","Boro Zone" 4,"Manhattan","Alphabet City","Yellow
Zone" 5,"Staten Island","Arden Heights","Boro Zone" 6,"Staten Island","Arrochar/Fort Wadsworth","Boro
Zone" 7,"Queens","Astoria","Boro Zone" 8,"Queens","Astoria Park","Boro Zone" 9,"Queens","Auburndale","Boro
> {code}

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message