beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-673) Data locality for Read.Bounded
Date Mon, 26 Sep 2016 08:42:21 GMT
Amit Sela created BEAM-673:
------------------------------

             Summary: Data locality for Read.Bounded
                 Key: BEAM-673
                 URL: https://issues.apache.org/jira/browse/BEAM-673
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
            Reporter: Amit Sela
            Assignee: Amit Sela


In some distributed filesystems, such as HDFS, we should be able to hint to Spark the preferred
locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L252

*Note: in case of 1-to-1 mapping of Read operation (e.g. TextIO) direct translation should
still be preferred, but this is pending HDFS support for Beam anyway.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message