beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukasz Cwik <lc...@google.com>
Subject Re: Help with adding python package dependencies when executing pyhton pipeline
Date Tue, 03 Jul 2018 21:13:43 GMT
Take a look at
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

On Tue, Jul 3, 2018 at 2:09 PM OrielResearch Eila Arich-Landkof <
eila@orielresearch.org> wrote:

> Hello all,
>
>
> I am using the python code to run my pipeline. similar to the following:
>
> options = PipelineOptions()google_cloud_options = options.view_as(GoogleCloudOptions)google_cloud_options.project
= 'my-project-id'google_cloud_options.job_name = 'myjob'google_cloud_options.staging_location
= 'gs://your-bucket-name-here/staging'google_cloud_options.temp_location = 'gs://your-bucket-name-here/temp'options.view_as(StandardOptions).runner
= 'DataflowRunner'
>
>
>
> I would like to add *pandas-gbq* package installation to my workers. What
> would be the recommendation to do so. Can I add it to the
> PipelineOptions()?
> I remember that there are few options, one of them was with creating a
> requirements text file but I can not remember where I saw it and if it is
> the simplest way when running the pipeline from datalab.
>
> Thanks you for any reference!
>
> --
> Eila
> www.orielresearch.org
> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
> m/Deep-Learning-In-Production/
> <https://www.meetup.com/Deep-Learning-In-Production/>
>
>
>

Mime
View raw message