airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bence Nagy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-139) Executing VACUUM with PostgresOperator
Date Tue, 12 Jul 2016 07:44:20 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372416#comment-15372416
] 

Bence Nagy commented on AIRFLOW-139:
------------------------------------

Oh, sorry, I'm subscribed to all Jira notifications but I think that's a bad idea since I
didn't see me specifically being mentioned here. So to answer a month late:

[~mattland]'s suggestion does not sound right to me at all. Special cases are already bad,
but in this case they wouldn't be even correct; Postgres 8.0.2 does not support autocommit
and even though it's unlikely to be running for people, it is just plain wrong in my opinion
to include an exception for it just because Redshift happened to fork that version. Not only
that, but then Redshift would break as well if Amazon ever decided to change their server
version (even if just rebasing onto 8.0.3 or something.) So, yeah, my stance here is a big
'no, don't do that'.

As I mentioned in the GitHub commit comments as well, the entire fact that Redshift worked
with the PostgresOperator is a coincidence, and as far as I know, was never officially said
to be supported by it. Due to nuances like this one, most other products working with databases
that I've seen (for instance DataGrip and Metabase) consider Redshift to be separate from
vanilla PostgreSQL, with the former requiring a special DB driver for it, and the latter not
supporting Redshift at all even though they support Postgres. In light of all this, I think
the correct and logical solution here is to add a RedshiftOperator, with different behavior
around the value of {{supports_autocommit}} and possibly some other differences as well down
the road if the operators get more complex. I don't have a strong opinion on whether it should
be a subclass of PostgresOperator, but I think I would implement this like that myself, since
after all, Redshift is a derivative of Postgres, sharing quite a lot of behavior.

> Executing VACUUM with PostgresOperator
> --------------------------------------
>
>                 Key: AIRFLOW-139
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-139
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: Airflow 1.7.0
>            Reporter: Rafael
>
> Dear Airflow Maintainers,
> h1. Environment
> * Airflow version: *v1.7.0*
> * Airflow components: *PostgresOperator*
> * Python Version: *Python 3.5.1*
> * Operating System: *15.4.0 Darwin*
> h1. Description of Issue
> I am trying to execute a `VACUUM` command as part of DAG with the `PostgresOperator`,
which fails with the following error:
> {quote}
> [2016-05-14 16:14:01,849] {__init__.py:36} INFO - Using executor SequentialExecutor
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 15, in <module>
>     args.func(args)
>   File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/bin/cli.py",
line 203, in run
>     pool=args.pool,
>   File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/models.py",
line 1067, in run
>     result = task_copy.execute(context=context)
>   File "/usr/local/lib/python3.5/site-packages/airflow/operators/postgres_operator.py",
line 39, in execute
>     self.hook.run(self.sql, self.autocommit, parameters=self.parameters)
>   File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py",
line 109, in run
>     cur.execute(s)
> psycopg2.InternalError: VACUUM cannot run inside a transaction block
> {quote}
> I could create a small python script that performs the operation, as explained in [this
stackoverflow entry](http://stackoverflow.com/questions/1017463/postgresql-how-to-run-vacuum-from-code-outside-transaction-block).
However, I would like to know first if the `VACUUM` command should be supported by the `PostgresOperator`.
> h1. Reproducing the Issue
> The operator can be declared as follows:
> {quote}
> conn = ('postgres_default')
> t4 = PostgresOperator(
>     task_id='vacuum',
>     postgres_conn_id=conn,
>     sql=("VACUUM public.table"),
>     dag=dag
>     )
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message