airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: how to capture sqoop mapreduce counters
Date Fri, 27 Jan 2017 02:06:59 GMT
thanks Jayesh, replied via github

On Thu, Jan 26, 2017 at 7:29 PM, Jayesh Senjaliya <jhsonline@gmail.com>
wrote:

> Hi Boris,
>
> looks like bash_operator has same bug that ssh_execute_operator has, which
> is it does not capture multi line output
>
> I have put up the fix for bash_oeprator as well :
> https://github.com/apache/incubator-airflow/pull/2026
>
> please take a look.
>
> Thanks
> Jayesh
>
>
>
>
>
>
> On Wed, Jan 25, 2017 at 1:25 PM, Boris Tyukin <boris@boristyukin.com>
> wrote:
>
> > I figured that luckily for me, the number of rows loaded by sqoop is
> > reported to stdout as the very last line. So I just used BashOperator and
> > set xcom_push=True. Then I did something like that:
> >
> >     # Log row_count ingested
> >     try:
> >         row_count = int(re.search('Retrieved (\d+) records',
> >                                   kwargs['ti'].xcom_pull(task_
> > ids='t_sqoop_from_cerner')).group(1))
> >         write_job_audit(get_job_audit_id_from_context(kwargs),
> > "rows_ingested_sqoop", row_count)
> >     except ValueError:
> >         write_job_audit(get_job_audit_id_from_context(kwargs),
> > "rows_ingested_sqoop", -1)
> >
> > The alternative I was considering is to get mapreduce jobid and then use
> > mapred command to get the needed counter - here is an example:
> >
> > mapred job -counter job_1484574566480_0002
> > org.apache.hadoop.mapreduce.TaskCounter
> > MAP_OUTPUT_RECORDS
> >
> > But I could not figure out an easy way to get job_id from BashOperator /
> > sqoop output. I guess I could create my own operator that would capture
> all
> > stdout lines not only the last one.
> >
> > On Tue, Jan 24, 2017 at 9:07 AM, Boris Tyukin <boris@boristyukin.com>
> > wrote:
> >
> > > Hello all,
> > >
> > > is there a way to capture sqoop counters either using bash or sqoop
> > > operator? Specifically I need to pull a total number of rows loaded.
> > >
> > > By looking at bash operator, I think there is an option to push the
> last
> > > line of output to xcom but sqoop and mapreduce output is a bit more
> > > complicated.
> > >
> > > Thanks!
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message