spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Nastetsky <alex.nastet...@vervemobile.com>
Subject CompositeInputFormat in Spark
Date Sat, 31 Oct 2015 03:53:34 GMT
Does Spark have an implementation similar to CompositeInputFormat in
MapReduce?

CompositeInputFormat joins multiple datasets prior to the mapper, that are
partitioned the same way with the same number of partitions, using the
"part" number in the file name in each dataset to figure out which file to
join with its counterparts in the other datasets.

Here is a similar question from earlier this year:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201505.mbox/%3CCADrn=epWL6GHs9hfYO3csuxhShTyCsrLbuJCMPxrTz4ZyPEVvw@mail.gmail.com%3E

>From what I can tell, there's no way to tell Spark about how a dataset had
been previously partitioned, other than repartitioning it in order to
achieve a map-side join with a similarly partitioned dataset.

Mime
View raw message