airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 李立伟 <hilili...@gmail.com>
Subject airflow BashOperator use beeline to insert data to hive with chinese,data be garbled
Date Wed, 10 Oct 2018 01:32:47 GMT
with BashOperator ,i use beeline to insert data into hive ,hql with chinese
characters ,after dag run success,hive data contain unreadable code.

python :

# -*- coding: utf-8 -*-
import airflow
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import BranchPythonOperator
from datetime import datetime
import time
from datetime import timedelta
import sys

import pendulum
local_tz = pendulum.timezone("Asia/Shanghai")

reload(sys)
sys.setdefaultencoding('utf-8')

default_args = {
              'owner': 'airflow',
              'depends_on_past':False,
              'start_date':datetime(2018,10,9,19,22,20,tzinfo=local_tz),
              'retries':0
}

dag = DAG(
            'inserthiveutf8',
             default_args=default_args,
             description='null',
              catchup=False,
             schedule_interval=None
)


adf37 = r"""
beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p
"*****"  -e 'insert into   di_zz.tt_wms_inout_detail_new(fac_id)
values ("中")'

  """
abcd8491539084126613 =BashOperator(
               task_id='abcd8491539084126613',
               bash_command=adf37,
               dag=dag)

i have tried this:

abcd8491539084126613 =BashOperator(
               task_id='abcd8491539084126613',
               bash_command="sh  ~/insert.sh  ",
               dag=dag)

this:

export LANG=en_US.UTF-8
 beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p
"*****"  -e 'insert into   di_zz.tt_wms_inout_detail_new(fac_id)
values ("中")'

this:

beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p
"*****"  -f ~/hql.sql

log:

[2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO  :
Compiling command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9):
insert into   di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
[2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO  :
Semantic Analysis Completed
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  :
Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0,
type:timestamp, comment:null), FieldSchema(name:_col1, type:string,
comment:null), FieldSchema(name:_col2, type:void, comment:null),
FieldSchema(name:_col3, type:void, comment:null),
FieldSchema(name:_col4, type:void, comment:null),
FieldSchema(name:_col5, type:void, comment:null),
FieldSchema(name:_col6, type:void, comment:null),
FieldSchema(name:_col7, type:bigint, comment:null),
FieldSchema(name:_col8, type:bigint, comment:null),
FieldSchema(name:_col9, type:bigint, comment:null),
FieldSchema(name:_col10, type:void, comment:null),
FieldSchema(name:_col11, type:timestamp, comment:null)],
properties:null)
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  :
Completed compiling
command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9);
Time taken: 0.291 seconds
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  :
Executing command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9):
insert into   di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  : Query
ID = hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9

data:

+------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
| tt_wms_inout_detail_new.stat_date  | tt_wms_inout_detail_new.fac_id
| tt_wms_inout_detail_new.fac_name  | tt_wms_inout_detail_new.ware_id
| tt_wms_inout_detail_new.ware_name  |
tt_wms_inout_detail_new.ware_type  |
tt_wms_inout_detail_new.product_code  |
tt_wms_inout_detail_new.ware_cnt  | tt_wms_inout_detail_new.ware_in  |
tt_wms_inout_detail_new.ware_out  |
tt_wms_inout_detail_new.sap_factory_name  |
tt_wms_inout_detail_new.di_etl_date  |
+------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
| NULL                               | ���
| NULL                              | NULL
| NULL                               | NULL
   | NULL                                  | NULL
        | NULL                             | NULL
        | NULL                                      | NULL
                    |

I am searching for a long time on net. But no use. Please help or try to
give some ideas how to achieve this.

Thanks in advance.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message