beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <>
Subject [jira] [Commented] (BEAM-59) IOChannelFactory rethinking/redesign
Date Thu, 06 Oct 2016 21:00:24 GMT


Amit Sela commented on BEAM-59:

We've discussed over the dev list about data locality, and it seems that it makes sense that
if HDFS can provide this information to a runner, we should use it.
[] and [~jbonofre] do you think it should be a part of this effort ? I've
opened BEAM-673 for the Spark runner when it seemed there was no interest in this as part
of the SDK, but the discussion has grown since and there seems to be an agreement that it
should be a part of the HdfsIO or some HdfsUtils.

> IOChannelFactory rethinking/redesign
> ------------------------------------
>                 Key: BEAM-59
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core, sdk-java-gcp
>            Reporter: Daniel Halperin
> Right now, FileBasedSource and FileBasedSink communication is mediated by IOChannelFactory.
There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. This should
be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" util package
and subject to change. We need users to be able to add new backends ('s3://', 'hdfs://', etc.)
directly, without fear that they will be broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration time, etc.

This message was sent by Atlassian JIRA

View raw message