beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Sisk (JIRA)" <>
Subject [jira] [Created] (BEAM-1878) IO ITs: how to handle custom docker images?
Date Tue, 04 Apr 2017 17:55:41 GMT
Stephen Sisk created BEAM-1878:

             Summary: IO ITs: how to handle custom docker images?
                 Key: BEAM-1878
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-extensions
            Reporter: Stephen Sisk
            Assignee: Stephen Sisk


For IO ITs that use data stores that need custom docker images in order to
run, we can't currently use them in a kubernetes cluster (which is where we
host our data stores.) I have a couple options for how to solve this and am
looking for feedback from folks involved in creating IO ITs/opinions on


We've discussed in the past that we'll want to allow developers to submit
just a dockerfile, and then we'll use that when creating the data store on
kubernetes. This is the case for ElasticsearchIO and I assume more data
stores in the future will want to do this. It's also looking like it'll be
necessary to use custom docker images for the HadoopInputFormatIO's
cassandra ITs - to run a cassandra cluster, there doesn't seem to be a good
image you can use out of the box.

In either case, in order to retrieve a docker image, kubernetes needs a
container registry - it will read the docker images from there. A simple
private container registry doesn't work because kubernetes config files are
static - this means that if local devs try to use the kubernetes files,
they point at the private container registry and they wouldn't be able to
retrieve the images since they don't have access. They'd have to manually
edit the files, which in theory is an option, but I don't consider that to
be acceptable since it feels pretty unfriendly (it is simple, so if we
really don't like the below options we can revisit it.)

Quick summary of the options


We can:

* Start using something like k8 helm - this adds more dependencies, adds a
small amount of complexity (this is my recommendation, but only by a little)

* Start pushing images to docker hub - this means they'll be publicly
visible and raises the bar for maintenance of those images

* Host our own public container registry - this means running our own
public service with costs, etc..

I discussed the options in detail in my original email to dev@:

I ran into this
question while working on getting the HIFIO cassandra cluster running, so I
might prototype with that.

This message was sent by Atlassian JIRA

View raw message