airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Galak (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-2010) Make HttpHook inner connection pool configurable
Date Thu, 18 Jan 2018 13:07:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Galak updated AIRFLOW-2010:
---------------------------
    Description: 
HttpHook is using request module to perform http/https calls. but it is hidden inside implementation.
Therefore, it is not possible to choose any value for _pool_connections_ or  _pool_maxsize_
parameters, defaulting to 10. (see [request module documentation|http://docs.python-requests.org/en/latest/api/#lower-lower-level-classes])

_{{requests.adapters.HTTPAdapter}}_ parameters could probably be passed through Airflow Connection
extra parameters ?

As a consequence, calling a REST API concurrently (using [ThreadPoolExecutor|https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor])
is limited to 10 workers maximum. Each additional worker is stopped with the following warning:
{quote}
{{ \{connectionpool.py\} WARNING - Connection pool is full, discarding connection: my.api.example.org
}}
{quote}
See [this question on stackoverflow|https://stackoverflow.com/questions/23632794/in-requests-library-how-can-i-avoid-httpconnectionpool-is-full-discarding-con]
about Http connexion pools configuration

  was:
HttpHook is using request module to perform http/https calls. but it is hidden inside implementation.
Therefore, it is not possible to choose any value for _pool_connections_ or  _pool_maxsize_
parameters, defaulting to 10. (see [request module documentation|http://docs.python-requests.org/en/latest/api/#lower-lower-level-classes])

_{{requests.adapters.HTTPAdapter}}_ parameters could probably be passed through Airflow Connection
extra parameters ?

As a consequence, calling a REST API concurrently (using [ThreadPoolExecutor|https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor])
is limited to 10 workers maximum. Each additional worker is stopped with the following warning:
{quote}{{{connectionpool.py} WARNING - Connection pool is full, discarding connection: my.api.example.org}}{quote}
See [this question on stackoverflow|https://stackoverflow.com/questions/23632794/in-requests-library-how-can-i-avoid-httpconnectionpool-is-full-discarding-con]
about Http connexion pools configuration


> Make HttpHook inner connection pool configurable
> ------------------------------------------------
>
>                 Key: AIRFLOW-2010
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2010
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>    Affects Versions: 1.8.0
>            Reporter: Galak
>            Priority: Major
>
> HttpHook is using request module to perform http/https calls. but it is hidden inside
implementation. Therefore, it is not possible to choose any value for _pool_connections_ or 
_pool_maxsize_ parameters, defaulting to 10. (see [request module documentation|http://docs.python-requests.org/en/latest/api/#lower-lower-level-classes])
> _{{requests.adapters.HTTPAdapter}}_ parameters could probably be passed through Airflow
Connection extra parameters ?
> As a consequence, calling a REST API concurrently (using [ThreadPoolExecutor|https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor])
is limited to 10 workers maximum. Each additional worker is stopped with the following warning:
> {quote}
> {{ \{connectionpool.py\} WARNING - Connection pool is full, discarding connection: my.api.example.org
}}
> {quote}
> See [this question on stackoverflow|https://stackoverflow.com/questions/23632794/in-requests-library-how-can-i-avoid-httpconnectionpool-is-full-discarding-con]
about Http connexion pools configuration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message