airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [airflow] dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio
Date Thu, 02 Jan 2020 17:05:35 GMT
dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio
URL: https://github.com/apache/airflow/pull/5788#issuecomment-570269996
 
 
   I'm not opposed to this PR/POC if it works; are there additional unit tests and functional
tests to demonstrate that it works?  I don't know enough about how it performs and how it
manages the event loop and how the loop manages subprocesses - it seems like subprocesses
cannot yield from (or await) to return control back to the main loop from anywhere within
the `command` (i.e. the command itself cannot use asyncio).  I want to understand a bit more
about the Airflow task ecosystem and how asyncio/coroutines fit into an existing workflow
or require an alternative workflow ecosystem (additional notes are in AIP-28).  A key part
of the architecture is whether an event loop belongs on an executor and/or on workers; AFAIK,
an executor could manage an event loop and all the "tasks" that it runs need to be compatible
with asyncio/coroutine behavior (and non-blocking libraries need to be used for db access
etc).
   
   In this PR, the primary point of interest and/or confusion is in
   ```
   foo = await asyncio.create_subprocess_exec(
                   *command,
                   stdin=asyncio.subprocess.PIPE,
                   stdout=asyncio.subprocess.PIPE,
                   stderr=asyncio.subprocess.PIPE,
                   loop=self.loop
               )
   ```
   vs. using an awaitable coroutine or task as in the sequence diagram noted in
   https://docs.python.org/3.6/library/asyncio-task.html#example-chain-coroutines
   
   By using a subprocess, the `command` itself has no ability to use asyncio coroutines -
is that right?  That's not entirely a bad thing, but if that's correct, then it could mean
that this worker could spawn a lot of processes that are all blocking within each process.
 I just wonder whether it really differs from using multiprocessing in this regard.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message