beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Jungwirth (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
Date Thu, 13 Sep 2018 20:51:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614047#comment-16614047
] 

Scott Jungwirth edited comment on BEAM-3106 at 9/13/18 8:50 PM:
----------------------------------------------------------------

I just ran into this issue using Google's Cloud Composer (managed airflow) after adding the
2.6.0 (current latest) beam sdk pypy package (apache-beam[gcp]>=2.6.0). Looking at the
build log, it looks like apache-beam[gcp] caused a downgrade of some other google-cloud packages:

 
{code:java}
...
 Installing collected packages: pydot, fastavro, pytz, google-cloud-core, google-cloud-bigquery,
apache-beam, pysftp, google-cloud-firestore, msgpack, cachecontrol, firebase-admin, webob,
bugsnag
 Found existing installation: pytz 2018.5
 Uninstalling pytz-2018.5:
 Successfully uninstalled pytz-2018.5
 Found existing installation: google-cloud-core 0.28.1
 Uninstalling google-cloud-core-0.28.1:
 Successfully uninstalled google-cloud-core-0.28.1
 Found existing installation: google-cloud-bigquery 1.5.0
 Uninstalling google-cloud-bigquery-1.5.0:
 Successfully uninstalled google-cloud-bigquery-1.5.0
 Found existing installation: apache-beam 2.5.0
 Uninstalling apache-beam-2.5.0:
 Successfully uninstalled apache-beam-2.5.0
 Successfully installed apache-beam-2.6.0 bugsnag-3.4.3 cachecontrol-0.12.5 fastavro-0.19.7
firebase-admin-2.13.0 google-cloud-bigquery-0.25.0 google-cloud-core-0.25.0 google-cloud-firestore-0.29.0
msgpack-0.5.6 pydot-1.2.4 pysftp-0.2.9 pytz-2018.4 webob-1.8.2{code}
I tracked this down to the pinned requirement for bigquery: {{google-cloud-bigquery==0.25.0}}  [https://github.com/apache/beam/blob/v2.6.0/sdks/python/setup.py#L140]

 

Which led to these pip warnings

 
{code:java}
$ pipdeptree --warn
Warning!!! Possibly conflicting dependencies found:
* google-cloud-bigquery==0.25.0
- google-cloud-core [required: <0.26dev,>=0.25.0, installed: 0.28.1]
* google-cloud-pubsub==0.26.0
- google-cloud-core [required: <0.26dev,>=0.25.0, installed: 0.28.1]
* google-cloud-dataflow==2.5.0
- apache-beam [required: ==2.5.0, installed: 2.6.0]
* pandas-gbq==0.6.0
- google-cloud-bigquery [required: >=0.32.0, installed: 0.25.0]{code}
 

 And the exception I was getting was from another google cloud storage module

 
{code:java}
File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", line 535, in download_to_file
 ...
 File "/usr/local/lib/python2.7/site-packages/google/resumable_media/_helpers.py", line 146,
in wait_and_retry 
 response = func() 
 File "/usr/local/lib/python2.7/site-packages/google_auth_httplib2.py", line 198, in request 
 uri, method, body=body, headers=request_headers, **kwargs) 
 TypeError: request() got an unexpected keyword argument 'data'{code}
 I was able to work-around this issue by explicitly installing the desired versions of the google-cloud-core>=0.28.0
and google-cloud-bigquery>=1.5.0 modules after the apache-beam[gcp]>=2.6.0 module.

 

 


was (Author: sjungwirth):
I just ran into this issue using Google's Cloud Composer (managed airflow) after adding the
2.6.0 (current latest) beam sdk pypy package (apache-beam[gcp]>=2.6.0). Looking at the
build log, it looks like apache-beam[gcp] caused a downgrade of some other google-cloud packages:

 
{code:java}
...
 Installing collected packages: pydot, fastavro, pytz, google-cloud-core, google-cloud-bigquery,
apache-beam, pysftp, google-cloud-firestore, msgpack, cachecontrol, firebase-admin, webob,
bugsnag
 Found existing installation: pytz 2018.5
 Uninstalling pytz-2018.5:
 Successfully uninstalled pytz-2018.5
 Found existing installation: google-cloud-core 0.28.1
 Uninstalling google-cloud-core-0.28.1:
 Successfully uninstalled google-cloud-core-0.28.1
 Found existing installation: google-cloud-bigquery 1.5.0
 Uninstalling google-cloud-bigquery-1.5.0:
 Successfully uninstalled google-cloud-bigquery-1.5.0
 Found existing installation: apache-beam 2.5.0
 Uninstalling apache-beam-2.5.0:
 Successfully uninstalled apache-beam-2.5.0
 Successfully installed apache-beam-2.6.0 bugsnag-3.4.3 cachecontrol-0.12.5 fastavro-0.19.7
firebase-admin-2.13.0 google-cloud-bigquery-0.25.0 google-cloud-core-0.25.0 google-cloud-firestore-0.29.0
msgpack-0.5.6 pydot-1.2.4 pysftp-0.2.9 pytz-2018.4 webob-1.8.2{code}

 I tracked this down to the pinned requirement for bigquery: {{google-cloud-bigquery==0.25.0}}  [https://github.com/apache/beam/blob/v2.6.0/sdks/python/setup.py#L140]

 

Which led to these pip warnings

 
{code:java}
$ pipdeptree --warn
 Warning!!! Possibly conflicting dependencies found:

google-cloud-storage==1.10.0   google-cloud-core [required: <0.29dev,>=0.28.0, installed:
0.25.0]   google-cloud-firestore==0.29.0   google-cloud-core [required: <0.29dev,>=0.28.0,
installed: 0.25.0]   pandas-gbq==0.6.0   google-cloud-bigquery [required: >=0.32.0, installed:
0.25.0]   google-cloud-dataflow==2.5.0   apache-beam [required: ==2.5.0, installed: 2.6.0]
  google-cloud-logging==1.6.0   google-cloud-core [required: <0.29dev,>=0.28.0, installed:
0.25.0] {code}
 

 

 And the exception I was getting was from another google cloud storage module

 
{code:java}
File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", line 535, in download_to_file
 ...
 File "/usr/local/lib/python2.7/site-packages/google/resumable_media/_helpers.py", line 146,
in wait_and_retry 
 response = func() 
 File "/usr/local/lib/python2.7/site-packages/google_auth_httplib2.py", line 198, in request 
 uri, method, body=body, headers=request_headers, **kwargs) 
 TypeError: request() got an unexpected keyword argument 'data'{code}

  I was able to work-around this issue by explicitly installing the desired versions of
the google-cloud-core>=0.28.0 and google-cloud-bigquery>=1.5.0 modules after the apache-beam[gcp]>=2.6.0
module.

 

 

> Consider not pinning all python dependencies, or moving them to requirements.txt
> --------------------------------------------------------------------------------
>
>                 Key: BEAM-3106
>                 URL: https://issues.apache.org/jira/browse/BEAM-3106
>             Project: Beam
>          Issue Type: Wish
>          Components: build-system
>    Affects Versions: 2.1.0
>         Environment: python
>            Reporter: Maximilian Roos
>            Priority: Major
>
> Currently all python dependencies are [pinned or capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well tested dependencies,
having them specified in `setup.py` forces them to an exact state on each install of Beam.
This makes using Beam in any environment with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have libraries that
won't work with an older version (but Beam _does_ work with an newer version). We have to
do a bunch of gymnastics to get the correct versions installed because of this. Unfortunately,
airflow repeats this practice and conflicts on a number of dependencies, adding further complication
(but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for example
no libraries in numerical python do this. Here's a [discussion on SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message