beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismaël Mejía (JIRA) <j...@apache.org>
Subject [jira] [Commented] (BEAM-673) Data locality for Read.Bounded
Date Mon, 24 Apr 2017 20:06:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981796#comment-15981796
] 

Ismaël Mejía commented on BEAM-673:
-----------------------------------

Not really, the goal of adding this to FSR was to create that API changes before the Source
API freezes because of stability, even if in first instance it could not even be implemented
or just for one runner (Spark) and probably one source (HDFS). I think the only thing we need
is to add a method to hint the location for sources, and even this method can have default
empty list implementation so runners would implement this in a opt-in fashion.

> Data locality for Read.Bounded
> ------------------------------
>
>                 Key: BEAM-673
>                 URL: https://issues.apache.org/jira/browse/BEAM-673
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Ismaël Mejía
>             Fix For: First stable release
>
>
> In some distributed filesystems, such as HDFS, we should be able to hint to Spark the
preferred locations of splits.
> Here is an example of how Spark does that for Hadoop RDDs:
> https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L249



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message