airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2412) Fix HiveCliHook.load_file to address HIVE-10541
Date Tue, 08 May 2018 09:52:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467171#comment-16467171
] 

ASF subversion and git services commented on AIRFLOW-2412:
----------------------------------------------------------

Commit baf15e11a51a07ad5adbc1be36a43f313f826a61 in incubator-airflow's branch refs/heads/master
from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=baf15e1 ]

[AIRFLOW-2412] Fix HiveCliHook.load_file to address HIVE-10541

HiveCliHook.load_file doesn't actually execute
LOAD DATA statement via beeline bundled with
Hive under 2.0 due to HIVE-10541.
This PR provides a workaround for this problem.

Closes #3327 from sekikn/AIRFLOW-2412


> Fix HiveCliHook.load_file to address HIVE-10541
> -----------------------------------------------
>
>                 Key: AIRFLOW-2412
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2412
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hive_hooks, hooks
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Major
>
> HiveCliHook.load_file generates a query file and executes it using {{-f}} option, but
that file doesn't have a newline at the end. In such case, beeline bundled Hive under 1.3
doesn't execute the last query due to [a bug|https://issues.apache.org/jira/browse/HIVE-10541].
Example:
> register connection and prepare file to be loaded:
> {code}
> $ airflow connections -a --conn_id hive_cli --conn_type hive_cli --conn_host localhost
--conn_port 10000 --conn_schema default --conn_extra '{"use_beeline": true, "auth": "none"}'
> [2018-05-02 18:38:48,208] {__init__.py:48} INFO - Using executor SequentialExecutor
>         Successfully added `conn_id`=hive_cli : hive_cli://:@localhost:10000/default
> $ cat /tmp/t
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> {code}
> executing load_file via ipython:
> {code}
> In [1]: from airflow.hooks.hive_hooks import HiveCliHook
> In [2]: hook = HiveCliHook("hive_cli")
> [2018-05-02 18:50:42,161] {base_hook.py:85} INFO - Using connection to: localhost
> In [3]: hook.load_file(field_dict={"c": "int"}, filepath="/tmp/t", table="foo")
> (snip)
> [2018-05-02 18:51:06,043] {hive_hooks.py:216} INFO - beeline -u jdbc:hive2://localhost:10000/default;auth=none
-f /tmp/airflow_hiveop_75jxXU/tmpmvhi0M
> [2018-05-02 18:51:07,397] {hive_hooks.py:231} INFO - Connecting to jdbc:hive2://localhost:10000/default;auth=none
> [2018-05-02 18:51:07,598] {hive_hooks.py:231} INFO - Connected to: Apache Hive (version
1.2.1)
> [2018-05-02 18:51:07,600] {hive_hooks.py:231} INFO - Driver: Hive JDBC (version 1.2.1)
> [2018-05-02 18:51:07,600] {hive_hooks.py:231} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ
> [2018-05-02 18:51:07,644] {hive_hooks.py:231} INFO - 0: jdbc:hive2://localhost:10000/default>
USE default;
> [2018-05-02 18:51:07,749] {hive_hooks.py:231} INFO - No rows affected (0.104 seconds)
> [2018-05-02 18:51:07,773] {hive_hooks.py:231} INFO - 0: jdbc:hive2://localhost:10000/defTABLE
fooD DATA LOCAL INPATH '/tmp/t' OVERWRITE INTO
> [2018-05-02 18:51:07,773] {hive_hooks.py:231} INFO - Closing: 0: jdbc:hive2://localhost:10000/default;auth=none
> {code}
> Hive table is created, but no data is loaded:
> {code}
> 0: jdbc:hive2://localhost:10000/default> SHOW TABLES;
> +-----------+--+
> | tab_name  |
> +-----------+--+
> | foo       |
> +-----------+--+
> 1 row selected (0.037 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT * FROM foo;
> +--------+--+
> | foo.c  |
> +--------+--+
> +--------+--+
> No rows selected (0.1 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message