airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Singh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1496) Druid hook unable to load data from hdfs
Date Wed, 09 Aug 2017 16:22:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rahul Singh updated AIRFLOW-1496:
---------------------------------
    Description: 
Hi,

I am trying to use druid hook to load data from hdfs to druid , below is my dag script :

from datetime import datetime, timedelta
import json
from airflow.hooks import HttpHook, DruidHook
from airflow.operators import PythonOperator
from airflow.models import DAG

def check_druid_con():
 dr_hook = DruidHook(druid_ingest_conn_id='DRUID_INDEX',druid_query_conn_id='DRUID_QUERY')
 dr_hook.load_from_hdfs("druid_airflow","hdfs://10.xx.xx.xx/demanddata/demand2.tsv","stay_date",["channel","rate"],"2016-12-11/2017-12-13",1,-1,metric_spec=[{
"name" : "count", "type" : "count" }],hadoop_dependency_coordinates="org.apache.hadoop:hadoop-client:2.7.3")

default_args = {
    'owner': 'TC',
    'start_date': datetime(2017, 8, 7),
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}
dag = DAG('druid_data_load', default_args=default_args)
druid_task1=PythonOperator(task_id='check_druid',
                   python_callable=check_druid_con,
                   dag=dag)


I keep getting error , TypeError: load_from_hdfs() takes at least 10 arguments (10 given)
. However I have given 10 arguments to load_from_hdfs , still it errors out . Please help.

Regards
Rahul

  was:
Hi,

I am trying to use druid hook to load data from hdfs to druid , below is my dag script :

from datetime import datetime, timedelta
import json
from airflow.hooks import HttpHook, DruidHook
from airflow.operators import PythonOperator
from airflow.models import DAG

def check_druid_con():
 dr_hook = DruidHook(druid_ingest_conn_id='DRUID_INDEX',druid_query_conn_id='DRUID_QUERY')
 dr_hook.load_from_hdfs("druid_airflow","hdfs://10.55.26.71/demanddata/demand2.tsv","stay_date",["channel","rate"],"2016-12-11/2017-12-13",1,-1,metric_spec=[{
"name" : "count", "type" : "count" }],hadoop_dependency_coordinates="org.apache.hadoop:hadoop-client:2.7.3")

default_args = {
    'owner': 'TC',
    'start_date': datetime(2017, 8, 7),
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}
dag = DAG('druid_data_load', default_args=default_args)
druid_task1=PythonOperator(task_id='check_druid',
                   python_callable=check_druid_con,
                   dag=dag)


I keep getting error , TypeError: load_from_hdfs() takes at least 10 arguments (10 given)
. However I have given 10 arguments to load_from_hdfs , still it errors out . Please help.

Regards
Rahul


> Druid hook unable to load data from hdfs
> ----------------------------------------
>
>                 Key: AIRFLOW-1496
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1496
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>    Affects Versions: 1.8.0
>         Environment: RHEL 6.7 , Python 2.7.13
>            Reporter: Rahul Singh
>
> Hi,
> I am trying to use druid hook to load data from hdfs to druid , below is my dag script
:
> from datetime import datetime, timedelta
> import json
> from airflow.hooks import HttpHook, DruidHook
> from airflow.operators import PythonOperator
> from airflow.models import DAG
> def check_druid_con():
>  dr_hook = DruidHook(druid_ingest_conn_id='DRUID_INDEX',druid_query_conn_id='DRUID_QUERY')
>  dr_hook.load_from_hdfs("druid_airflow","hdfs://10.xx.xx.xx/demanddata/demand2.tsv","stay_date",["channel","rate"],"2016-12-11/2017-12-13",1,-1,metric_spec=[{
"name" : "count", "type" : "count" }],hadoop_dependency_coordinates="org.apache.hadoop:hadoop-client:2.7.3")
> default_args = {
>     'owner': 'TC',
>     'start_date': datetime(2017, 8, 7),
>     'retries': 1,
>     'retry_delay': timedelta(minutes=5)
> }
> dag = DAG('druid_data_load', default_args=default_args)
> druid_task1=PythonOperator(task_id='check_druid',
>                    python_callable=check_druid_con,
>                    dag=dag)
> I keep getting error , TypeError: load_from_hdfs() takes at least 10 arguments (10 given)
. However I have given 10 arguments to load_from_hdfs , still it errors out . Please help.
> Regards
> Rahul



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message