airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: how to capture sqoop mapreduce counters
Date Wed, 25 Jan 2017 21:25:59 GMT
I figured that luckily for me, the number of rows loaded by sqoop is
reported to stdout as the very last line. So I just used BashOperator and
set xcom_push=True. Then I did something like that:

    # Log row_count ingested
    try:
        row_count = int(re.search('Retrieved (\d+) records',
                                  kwargs['ti'].xcom_pull(task_
ids='t_sqoop_from_cerner')).group(1))
        write_job_audit(get_job_audit_id_from_context(kwargs),
"rows_ingested_sqoop", row_count)
    except ValueError:
        write_job_audit(get_job_audit_id_from_context(kwargs),
"rows_ingested_sqoop", -1)

The alternative I was considering is to get mapreduce jobid and then use
mapred command to get the needed counter - here is an example:

mapred job -counter job_1484574566480_0002
org.apache.hadoop.mapreduce.TaskCounter
MAP_OUTPUT_RECORDS

But I could not figure out an easy way to get job_id from BashOperator /
sqoop output. I guess I could create my own operator that would capture all
stdout lines not only the last one.

On Tue, Jan 24, 2017 at 9:07 AM, Boris Tyukin <boris@boristyukin.com> wrote:

> Hello all,
>
> is there a way to capture sqoop counters either using bash or sqoop
> operator? Specifically I need to pull a total number of rows loaded.
>
> By looking at bash operator, I think there is an option to push the last
> line of output to xcom but sqoop and mapreduce output is a bit more
> complicated.
>
> Thanks!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message