spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: File list read into single RDD
Date Tue, 29 Apr 2014 02:15:52 GMT

BTW just so I know where to look next time, was that in some docs?

On Apr 28, 2014, at 7:04 PM, Nicholas Chammas <> wrote:

Yep, as I just found out, you can also provide sc.textFile() with a comma-delimited string
of all the files you want to load.

For example:

So once you have your list of files, concatenate their paths like that and pass the single
string to textFile().


On Mon, Apr 28, 2014 at 7:23 PM, Pat Ferrel <> wrote:
sc.textFile(URI) supports reading multiple files in parallel but only with a wildcard. I need
to walk a dir tree, match a regex to create a list of files, then I’d like to read them
into a single RDD in parallel. I understand these could go into separate RDDs then a union
RDD can be created. Is there a way to create a single RDD from a URI list?

View raw message