beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davor Bonaci (JIRA)" <>
Subject [jira] [Commented] (BEAM-1556) Spark executors need to register IO factories
Date Tue, 28 Feb 2017 01:05:45 GMT


Davor Bonaci commented on BEAM-1556:

It certainly is an issue in other runners as well.

I wouldn't do in the context of a {{FileBasedSource}}. Users should be able to call the {{FileSystem}}
API from, say, {{@ProcessElement}} method of a {{DoFn}}. So, I think the registration should
be done before any "user code" is invoked.

Doing it in worker startup might not be ideal -- the constructor takes {{PipelineOptions}}
as an argument. Since jobs could have different options, it probably needs to happen on a
per-task basis, likely at the point the worker receives the task from the master and deserializes

> Spark executors need to register IO factories
> ---------------------------------------------
>                 Key: BEAM-1556
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Frances Perry
>            Assignee: Jean-Baptiste Onofré
> The Spark executors need to call IOChannelUtils.registerIOFactories(options) in order
to support GCS file and make the default WordCount example work.
> Context in this thread:

This message was sent by Atlassian JIRA

View raw message