airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Sanko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-295) Beeline called into HiveCliHook.run() read unclosed file and skip last statement
Date Thu, 30 Jun 2016 07:09:10 GMT
Alexey Sanko created AIRFLOW-295:
------------------------------------

             Summary: Beeline called into HiveCliHook.run() read unclosed file and skip last
statement
                 Key: AIRFLOW-295
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-295
             Project: Apache Airflow
          Issue Type: Bug
            Reporter: Alexey Sanko


If hql into HiveOperator which use beeline connection contains a lot of statements and doesn't
contain additional space line at the end beeline skip last statement.
As I understand beeline directly read unclosed file (which cannot be closed of NamedTemporaryFile
class usage) and get unexpected EOF.
{code}
hive = HiveOperator(
    dag = hive_dag,
    start_date = datetime(2016, 1, 1),
    task_id='asanko_cli_remote_test',
    hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds
}}';
desc asanko.test_airflow_dual;
""",
    hive_cli_conn_id='asanko_hive_cli_beeline',
    schema='asanko',
    default_args=args,
    run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:01:51,346] {models.py:1041} INFO - Executing <Task(HiveOperator): asanko_cli_remote_test>
on 2016-01-01 00:00:00
[2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01';
desc asanko.test_airflow_dual;
[2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: asanko_hive
[2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f /tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR
-u jdbc:hive2://asanko_hive:10000/default;auth=none -n asanko -p pwd
[2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms
[2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to jdbc:hive2://asanko_hive:10000/default;auth=none
[2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive (version 0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ
[2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 0.12.0-cdh5.1.3 by Apache
Hive
[2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> USE asanko;
[2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 seconds)
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive>
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> use asanko;
[2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 seconds)
[2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> drop
table if exists test_airflow_dual;
[2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 seconds)
[2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> create
table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01';
[2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 seconds)
[2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> desc
asanko.test_airflow_dual;Closing: org.apache.hive.jdbc.HiveConnection
{code}

But if we manually add empty row last statement successfully run:
{code}
hive = HiveOperator(
    dag = hive_dag,
    start_date = datetime(2016, 1, 1),
    task_id='asanko_cli_remote_test',
    hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds
}}';
desc asanko.test_airflow_dual;

""",
    hive_cli_conn_id='asanko_hive_cli_beeline',
    schema='asanko',
    default_args=args,
    run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing <Task(HiveOperator): asanko_cli_remote_test>
on 2016-01-01 00:00:00
[2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01';
desc asanko.test_airflow_dual;

[2016-06-29 03:04:01,388] {base_hook.py:53} INFO - Using connection to: asanko_hive
[2016-06-29 03:04:01,390] {hive_hooks.py:105} INFO - beeline -f /tmp/airflow_hiveop_vmWhkH/tmpDq9Lyp
-u jdbc:hive2://asanko_hive:10000/default;auth=none -n asanko -p pwd
[2016-06-29 03:04:02,216] {hive_hooks.py:116} INFO - scan complete in 2ms
[2016-06-29 03:04:02,217] {hive_hooks.py:116} INFO - Connecting to jdbc:hive2://asanko_hive:10000/default;auth=none
[2016-06-29 03:04:02,708] {hive_hooks.py:116} INFO - Connected to: Apache Hive (version 0.12.0-cdh5.1.3)
[2016-06-29 03:04:02,708] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 0.12.0-cdh5.1.3)
[2016-06-29 03:04:02,709] {hive_hooks.py:116} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ
[2016-06-29 03:04:02,735] {hive_hooks.py:116} INFO - Beeline version 0.12.0-cdh5.1.3 by Apache
Hive
[2016-06-29 03:04:02,735] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> USE asanko;
[2016-06-29 03:04:02,786] {hive_hooks.py:116} INFO - No rows affected (0.05 seconds)
[2016-06-29 03:04:02,800] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive>
[2016-06-29 03:04:02,800] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> use asanko;
[2016-06-29 03:04:02,810] {hive_hooks.py:116} INFO - No rows affected (0.008 seconds)
[2016-06-29 03:04:02,812] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> drop
table if exists test_airflow_dual;
[2016-06-29 03:04:02,899] {hive_hooks.py:116} INFO - No rows affected (0.087 seconds)
[2016-06-29 03:04:02,902] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> create
table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01';
[2016-06-29 03:04:05,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:10,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:15,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:20,008] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:25,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:30,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:35,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:40,005] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:45,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:50,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:04:55,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:05:00,004] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:05:05,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:05:06,221] {hive_hooks.py:116} INFO - No rows affected (63.319 seconds)
[2016-06-29 03:05:06,225] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> desc
asanko.test_airflow_dual;
[2016-06-29 03:05:06,390] {hive_hooks.py:116} INFO - +-----------------------+-----------------------+-----------------------+
[2016-06-29 03:05:06,390] {hive_hooks.py:116} INFO - |       col_name        |       data_type
      |        comment        |
[2016-06-29 03:05:06,390] {hive_hooks.py:116} INFO - +-----------------------+-----------------------+-----------------------+
[2016-06-29 03:05:06,391] {hive_hooks.py:116} INFO - | x                     | string    
           | None                  |
[2016-06-29 03:05:06,391] {hive_hooks.py:116} INFO - +-----------------------+-----------------------+-----------------------+
[2016-06-29 03:05:06,391] {hive_hooks.py:116} INFO - 1 row selected (0.166 seconds)
[2016-06-29 03:05:06,394] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> Closing:
org.apache.hive.jdbc.HiveConnection
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message