beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Demeshchuk <dmi...@postmates.com>
Subject Re: Installing non-native Python dependencies in Dataflow
Date Fri, 09 Jun 2017 01:21:53 GMT
FYI, I tried to install a psycopg2 wheel from a file using the
"extra_packages" argument (although, wheels installation is apparently
still an experimental feature), but this led to a problem with ECS-2 vs
ECS-4 compatibility issues (looks like the Dataflow version of Python is
using ECS-2, while wheels for Linux generally use ECS-4).

What ended up working for me ultimately, though, is an approach similar to
juliaset, with a few small differences:
https://gist.github.com/doubleyou/27bf3abb0fc77a2bc9257e6adc5cfe8f

Note two things here:

1. We import the "install" class from setuptools, not from distutils. This,
in fact, has been the core problem for me. I haven't yet tried if the
juliaset example works for me at all, but I strongly suspect that it may
not work exactly because of this issue.

2. We handle commands in a simpler fashion, by just using one single class.

I'll make a Jira ticket later today or tomorrow to reflect my findings,
maybe make a pull request if I confirm that juliaset is not universally
working either, if that's fine.

On Tue, Jun 6, 2017 at 8:46 PM, Dmitry Demeshchuk <dmitry@postmates.com>
wrote:

> Yeah, I wasn't really pinning it myself, it's one of the dependency
> packages that depends on that specific version.
>
> Thanks for the information, I'll try to explicitly install 33.1.1 and see
> if it changes anything.
>
> On Tue, Jun 6, 2017 at 7:13 PM, Ahmet Altay <altay@google.com> wrote:
>
>> Pinning setuptools is generally not a good practice. The reason is at
>> installation time it might cause removal of the the setuptools that is
>> being used to install packages.
>>
>> FWIW, dataflow workers should have setuptools 33.1.1, which was released
>> in 2017/01/16.
>>
>> Ahmet
>>
>> On Tue, Jun 6, 2017 at 6:53 PM, Dmitry Demeshchuk <dmitry@postmates.com>
>> wrote:
>>
>>> Thanks, Ahmet, it really turned out that Stackdriver had more logs than
>>> just the Dataflow logs section.
>>>
>>> So, I ended up seeing this code that fails constantly:
>>>
>>> I    Running setup.py install for dataflow: started
>>> I      Running setup.py install for dataflow: finished with status 'error'
>>> I      Complete output from command /usr/bin/python -u -c "import setuptools,
tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n',
'\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-sHw6oI-record/install-record.txt
--single-version-externally-managed --compile:
>>> I      usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
>>> I         or: -c --help [cmd1 cmd2 ...]
>>> I         or: -c --help-commands
>>> I         or: -c cmd --help
>>> I
>>> I      error: option --single-version-externally-managed not recognized
>>> I
>>> I      ----------------------------------------
>>> I  Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize,
'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__,
'exec'))" install --record /tmp/pip-sHw6oI-record/install-record.txt --single-version-externally-managed
--compile" failed with error code 1 in /tmp/pip-bXyST4-build/
>>> I  /usr/local/bin/pip failed with exit status 1
>>>
>>>
>>> This seems to mean that the natively installed setuptools are too old,
>>> and the new command has been generated with a newer version of setuptools
>>> (specifically, my project has setuptools==36.0.1 as a dependency of some
>>> package). I'm still digging more through the Stackdriver logs but so far
>>> couldn't find out the exact reason of the failure.
>>>
>>> Also talking to the Dataflow folks, maybe they'll have a better idea.
>>> I'll also try to compare this to the output of successful pipelines and see
>>> if it gives me any ideas.
>>>
>>> Thank you.
>>>
>>> On Tue, Jun 6, 2017 at 4:40 PM, Ahmet Altay <altay@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 6, 2017 at 2:07 PM, Dmitry Demeshchuk <dmitry@postmates.com
>>>> > wrote:
>>>>
>>>>> Hi Ahmet,
>>>>>
>>>>> Thanks a lot for pointing out that doc, I somehow missed it from the
>>>>> official Python SDK page!
>>>>>
>>>>> One thing that comes to my mind is that generally one should probably
>>>>> use the 'install' command in setuptools, not 'build', like it's done
in
>>>>> https://github.com/apache/beam/blob/master/sdks/python/ap
>>>>> ache_beam/examples/complete/juliaset/setup.py#L113. Reason being, the
>>>>> 'build' step seems to be executed on the original machine, not inside
the
>>>>> runner's containers, while 'install' will be triggered inside of them.
If I
>>>>> run a pipeline that uses setup.py with a "build" step, it fails due to
>>>>> being unable to "apt-get install libpq-dev" on a mac.
>>>>>
>>>>
>>>> Thank you. This example should similarly work in install commands I
>>>> believe. Also, if possible please file a JIRA issue with your ideas and we
>>>> can work on improving things.
>>>>
>>>>
>>>>>
>>>>> I'm still trying to make it work with either build or install steps,
>>>>> talking to the Dataflow folks in parallel to get more understanding of
what
>>>>> I'm doing wrong (Dataflow doesn't send out installation failure logs
to
>>>>> Stackdriver, only runtime logs, so it seems).
>>>>>
>>>>
>>>> Have you tried looking worker-startup logs? All of the logs should be
>>>> in stackdriver.
>>>>
>>>>
>>>>>
>>>>> On Tue, Jun 6, 2017 at 9:21 AM, Ahmet Altay <altay@google.com>
wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Please see Managing Python Pipeline Dependencies [1] for various
ways
>>>>>> on installing additional dependencies. The section on non-python
>>>>>> dependencies is relevant to your question.
>>>>>>
>>>>>> Thank you,
>>>>>> Ahmet
>>>>>>
>>>>>> [1] https://beam.apache.org/documentation/sdks/python-pipeli
>>>>>> ne-dependencies/
>>>>>>
>>>>>> On Mon, Jun 5, 2017 at 11:52 PM, Morand, Sebastien <
>>>>>> sebastien.morand@veolia.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Interested too. Could be fine for instance to add sftp
>>>>>>> BoundedSource, but compilalation of paramiko with ssl library
(and so
>>>>>>> installation of ssl-dev)
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> *S├ębastien MORAND*
>>>>>>> Team Lead Solution Architect
>>>>>>> Technology & Operations / Digital Factory
>>>>>>> Veolia - Group Information Systems & Technology (IS&T)
>>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
>>>>>>> <+33%201%2085%2057%2071%2008>
>>>>>>> Bureau 0144C (Ouest)
>>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
>>>>>>> *www.veolia.com <http://www.veolia.com>*
>>>>>>> <http://www.veolia.com>
>>>>>>> <https://www.facebook.com/veoliaenvironment/>
>>>>>>> <https://www.youtube.com/user/veoliaenvironnement>
>>>>>>> <https://www.linkedin.com/company/veolia-environnement>
>>>>>>> <https://twitter.com/veolia>
>>>>>>>
>>>>>>> On 6 June 2017 at 08:01, Dmitry Demeshchuk <dmitry@postmates.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi again, folks,
>>>>>>>>
>>>>>>>> How should I go about installing Python packages that require
to be
>>>>>>>> built and/or require native dependencies like shared libraries
or such?
>>>>>>>>
>>>>>>>> I guess, I could potentially build the C-based modules using
the
>>>>>>>> same version of kernel and glibc that Dataflow is running,
but doesn't seem
>>>>>>>> like there's any way to install shared libraries at these
boxes, right?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Dmitry Demeshchuk.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> --------------------------------
>>>>>>> This e-mail transmission (message and any attached files) may
>>>>>>> contain information that is proprietary, privileged and/or confidential
to
>>>>>>> Veolia Environnement and/or its affiliates and is intended exclusively
for
>>>>>>> the person(s) to whom it is addressed. If you are not the intended
>>>>>>> recipient, please notify the sender by return e-mail and delete
all copies
>>>>>>> of this e-mail, including all attachments. Unless expressly authorized,
any
>>>>>>> use, disclosure, publication, retransmission or dissemination
of this
>>>>>>> e-mail and/or of its attachments is strictly prohibited.
>>>>>>>
>>>>>>> Ce message electronique et ses fichiers attaches sont strictement
>>>>>>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils
sont donc
>>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez
recu ce
>>>>>>> message par erreur, merci de le retourner a son emetteur et de
le detruire
>>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation,
la
>>>>>>> publication, la distribution, ou la reproduction non expressement
>>>>>>> autorisees de ce message et de ses pieces attachees sont interdites.
>>>>>>> ------------------------------------------------------------
>>>>>>> --------------------------------
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Dmitry Demeshchuk.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Dmitry Demeshchuk.
>>>
>>
>>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>



-- 
Best regards,
Dmitry Demeshchuk.

Mime
View raw message