airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Blanchard (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-5126) Read aws_session_token in extra_config of the aws hook
Date Tue, 06 Aug 2019 13:21:00 GMT
Alexandre Blanchard created AIRFLOW-5126:
--------------------------------------------

             Summary: Read aws_session_token in extra_config of the aws hook
                 Key: AIRFLOW-5126
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5126
             Project: Apache Airflow
          Issue Type: Improvement
          Components: hooks
    Affects Versions: 1.10.3
            Reporter: Alexandre Blanchard


Hi,

Thanks for the great software.

At my company, we enforce security around our aws account and all accounts must have mfa activated.
To use airflow with my account, I generate a session token with an expiration date using the
command
{code:java}
aws sts assume-role --role-arn <the-role-i-want-use> --role-session-name testing --serial-number
<my-personal-mfa-arn> --token-code <code-on-my-mfa-device>
 --duration-seconds 18000{code}
This way I retrieve all I need to connect to aws: a aws_access_key_id, a aws_secret_access_key
and a aws_session_token. 

Currently I'm using boto3 directly in my dag and it's working great. I would like to use a
connection managed by airflow but when I set the parameters this way:
{code:java}
airflow connections --add \
 --conn_id s3_log \
 --conn_type s3 \
 --conn_login "<aws_access_key_id>" \
 --conn_password "<aws_secret_access_key>" \
 --conn_extra "{ \
   \"aws_session_token\": \"<aws_session_token>\" \
}"
{code}
With a hook using this connection, I get the error:
{code:java}
[2019-08-06 12:31:28,157] {__init__.py:1580} ERROR - An error occurred (403) when calling
the HeadObject operation: Forbidden
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/__init__.py", line 1441, in
_run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line
112, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line
117, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/root/airflow/dags/s3Dag.py", line 48, in download_raw_data
    dataObject = s3hook.get_key("poc/raw_data.csv.gz", s3_bucket)
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/S3_hook.py", line 217, in get_key
    obj.load()
  File "/usr/local/lib/python3.7/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation:
Forbidden
{code}
Reading the code of the hook (https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/hooks/aws_hook.py#L90),
I understand that the session token is not read from the extra config. The only case a session
token is passed to the boto3 client is when we assume a role. In my case I want to use a role
I have already assumed.

So my suggestion is to read the session token from the extra config and use it to connect
to aws.

Do you think it is the right way to do it ? Does this workflow make sense ?

I am ready to contribute if my suggestion is accepted.

Regards



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message