airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kaxil Naik <kaxiln...@gmail.com>
Subject Re: [PROPOSE] Introduce and encourage pre-commit hooks framework to Airflow developer workflow
Date Tue, 23 Jul 2019 18:13:19 GMT
It is fully-optional so I don't think we need an AIP there.



On Tue, Jul 23, 2019 at 11:36 PM Beau Barker <beauinmelbourne@gmail.com>
wrote:

> Just add a .pre-commit-config.yaml to the project, no need for an AIP.
>
> > On 24 Jul 2019, at 2:42 am, Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
> >
> > Any more comments on it?
> > Should I make an AIP for that :)?  Or should I just ask a vote/propose a
> PR
> > ? Anyone has a strong opinion?
> > I think it changes the dev workflow quite a bit on one hand, but it is
> > fully optional on the other hand.
> >
> > On Thu, Jul 4, 2019 at 9:23 AM Kamil BreguĊ‚a <kamil.bregula@polidea.com>
> > wrote:
> >
> >> +1 At first I was skeptical about this tool. I prefer to run it by
> >> hand, but I am more and more convinced that using it as a pre-commit
> >> makes sense.
> >>
> >> On Wed, Jul 3, 2019 at 10:58 PM Felix Uellendall <feluelle@pm.me.invalid
> >
> >> wrote:
> >>>
> >>> +1 (non-binding), I love that. I use a git pre-commit bash script but I
> >> think the framework looks even better. :)
> >>>
> >>> -feluelle
> >>>
> >>> -------- Original Message --------
> >>>> On Jul 2, 2019, 19:48, Maxime Beauchemin wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> On Mon, Jul 1, [2019](tel:2019) at 11:20 PM Kaxil Naik <
> >> kaxilnaik@gmail.com> wrote:
> >>>>
> >>>>> +1 We have been using this on some of our Astronomer repositories
as
> >> well
> >>>>> and have been happy with it.
> >>>>>
> >>>>> Regards,
> >>>>> Kaxil
> >>>>>
> >>>>> On Tue, Jul 2, [2019](tel:2019), 11:46 Jarek Potiuk <
> >> Jarek.Potiuk@polidea.com> wrote:
> >>>>>
> >>>>>> TL;DR: I would like to make a proposal to add (easily managed)
> >>>>> pre-commit
> >>>>>> hooks with various lint checkers part of recommended (and
> >> encouraged)
> >>>>>> development workflow for Airflow. This will help with speeding-up
> >> local
> >>>>>> development cycle and decrease pressure on Travis CI.
> >>>>>>
> >>>>>> Over the last few months in our other oozie-to-airflow
> >>>>>> <https://github.com/GoogleCloudPlatform/oozie-to-airflow>
project
> >> we've
> >>>>>> been running pre-commit hooks - we've integrated the fantastic
> >>>>>> https://pre-commit.com/ framework and we never looked back.
The
> >> upcoming
> >>>>>> changes to docker images (AIP-10) makes it easy to standardise
> >>>>> pylint/mypy
> >>>>>> and other checks to be the same for every developer (using docker
> >> images)
> >>>>>> so it also makes possible to have consistent pre-commit environment
> >> for
> >>>>>> static checks for everyone.
> >>>>>>
> >>>>>> Pre-commit framework has a number of cool features you would
expect
> >> from
> >>>>>> such tool:
> >>>>>>
> >>>>>> - it is super easy to install and manage the checks ('pre-commit
> >>>>> install')
> >>>>>> - all the checks are managed in single .pre-commit-config.yaml
file
> >> (in
> >>>>>> repo)
> >>>>>> - it allows to run various checks only on the changes you are
> >> committing
> >>>>>> - it has nice and simple UI to run checks individually or skip
some
> >>>>> checks,
> >>>>>> or run on all files
> >>>>>> - it automatically uses all processors on your machine (it will
> >>>>> distribute
> >>>>>> changed files evenly)
> >>>>>> - it has great UI for reporting what's going on
> >>>>>> - it has a number of pre-defined checks that are super-useful
> >>>>>> - check if you have not left debug instructions (Ash :) - no
more
> >>>>>> problems with that!)
> >>>>>> - check if you resolved merge conflicts
> >>>>>> - check if you have not submitted a private key
> >>>>>> - check correctness of various files (shell, yaml, xml, whether
> >>>>> shebangs
> >>>>>> are added correctly etc.)
> >>>>>> - ....
> >>>>>> - it can even correct some of the problems found (it leaves
> >> corrected
> >>>>> files
> >>>>>> for you to commit):
> >>>>>> - automatically adding missing licences
> >>>>>> - automatically add python-encoding pragma
> >>>>>> - add Table of Contents for markdown files
> >>>>>> - and many more
> >>>>>> - it can be integrated in CI to run the very same checks but
on all
> >> files
> >>>>>> during CI.
> >>>>>> - you can use --no-verify switch to skip pre-commit
> >>>>>>
> >>>>>> There are multiple advantages of using pre-commit hooks. It
not only
> >>>>> speeds
> >>>>>> up local development (no more waiting for Travis) but if most
> >> people will
> >>>>>> start using it - it will also decrease the pressure on Travis
- less
> >>>>> people
> >>>>>> will be submitting PRs just to check if they pass all the checks.
I
> >> think
> >>>>>> that might be single biggest thing that we can do to have far
less
> >> CPU
> >>>>> used
> >>>>>> by Airflow builds.
> >>>>>>
> >>>>>> There is one disadvantage I noticed - when you have unstaged
> >> changes,
> >>>>>> pre-commit stashes them before running checks and sometimes
(if you
> >>>>> Ctrl^C
> >>>>>> very quickly) it will not stash them back - but it is recoverable
> >> it just
> >>>>>> needs one liner to recover. In normal circumstances it is really
> >> safe to
> >>>>>> run.
> >>>>>>
> >>>>>> I already made a working POC of pre-commit (not intended for
merge
> >> - it
> >>>>> has
> >>>>>> many more unrelated changes that I am going to submit as separate
> >> PRs).
> >>>>> It
> >>>>>> takes less than [20-30](tel:2030) seconds to run all the checks
in
> >> a small commit
> >>>>>> (including pylint/mypy and other checks) so IMHO it is perfectly
OK
> >> to
> >>>>> run
> >>>>>> on every commit. You can see the Job in CI here:
> >>>>>> https://travis-ci.org/PolideaInternal/airflow/jobs/552862055
and a
> >> short
> >>>>>> documentation I wrote on using pre-commits here:
> >>>>>>
> >>>>>>
> >>>>>
> >>
> https://github.com/PolideaInternal/airflow/blob/pre-commit-hooks/CONTRIBUTING.md#pre-commit-hooks
> >>>>>>
> >>>>>> Here are the checks I managed to get running for Airflow (pylint
is
> >>>>> skipped
> >>>>>> for now until we finish it but it can be tested in separate
job - I
> >> also
> >>>>>> did it in the same build):
> >>>>>>
> >>>>>> Docker build - might take longer on setup.py change (see
> >>>>>> ./.build/cache dir for logs).......................Passed
> >>>>>> Add licence for all XML, md
> >>>>>>
> >>>>>>
> >>>>>
> >>
> files...........................................................................Passed
> >>>>>> Add licence for all JS
> >>>>>>
> >>>>>>
> >>>>>
> >>
> files................................................................................Passed
> >>>>>> Add licence for all SQL
> >>>>>>
> >>>>>>
> >>>>>
> >>
> files...............................................................................Passed
> >>>>>> Add licence for all JINJA template
> >>>>>>
> >>>>>>
> >>>>>
> >>
> files....................................................................Passed
> >>>>>> Add licence for all other
> >>>>>>
> >>>>>>
> >>>>>
> >>
> files.............................................................................Passed
> >>>>>> No-tabs
> >>>>>>
> >>>>>
> >>
> checker.............................................................................................Passed
> >>>>>> Check yaml files with
> >>>>>>
> >>>>>>
> >>>>>
> >>
> yamllint..............................................................................Passed
> >>>>>> Check that executables have
> >>>>>>
> >>>>>>
> >>>>>
> >>
> shebangs........................................................................Passed
> >>>>>> Check for merge
> >>>>>>
> >>>>>>
> >>>>>
> >>
> conflicts...................................................................................Passed
> >>>>>> Check
> >>>>>>
> >>>>>
> >>
> Xml...................................................................................................Passed
> >>>>>> Debug Statements
> >>>>>>
> >>>>>>
> >>>>>
> >>
> (Python)...................................................................................Passed
> >>>>>> Detect Private
> >>>>>>
> >>>>>
> >>
> Key..........................................................................................Passed
> >>>>>> Fix python encoding
> >>>>>>
> >>>>>>
> >>>>>
> >>
> pragma..................................................................................Passed
> >>>>>> Fix End of
> >>>>>>
> >>>>>
> >>
> Files............................................................................................Passed
> >>>>>> Mixed line
> >>>>>>
> >>>>>
> >>
> ending...........................................................................................Passed
> >>>>>> Check hooks apply to the
> >>>>>>
> >>>>>>
> >>>>>
> >>
> repository.........................................................................Passed
> >>>>>> Add TOC to markdown
> >>>>>>
> >>>>>>
> >>>>>
> >>
> documents...............................................................................Passed
> >>>>>> Check Shell scripts syntax
> >>>>>>
> >>>>>>
> >>>>>
> >>
> corectness.......................................................................Passed
> >>>>>> Lint
> >>>>>>
> >>>>>
> >>
> dockerfile.............................................................................................Passed
> >>>>>> Run
> >>>>>>
> >>>>>
> >>
> mypy....................................................................................................Passed
> >>>>>> Run pylint for main
> >>>>>>
> >>>>>>
> >>>>>
> >>
> sources................................................................................Skipped
> >>>>>> Run pylint for
> >>>>>>
> >>>>>
> >>
> tests.......................................................................................Skipped
> >>>>>> Run
> >>>>>>
> >>>>>
> >>
> flake8..................................................................................................Passed
> >>>>>>
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Jarek Potiuk
> >>>>>> Polidea <https://www.polidea.com/> | Principal Software
Engineer
> >>>>>>
> >>>>>> M: [+48 660 796 129](tel:+48660796129)
> >> <[+48660796129](tel:+48660796129)>
> >>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>
> >>>>>
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
>


-- 
*Kaxil Naik*
*Big Data Consultant | DevOps Data Engineer*
*Certified *Google Cloud Data Engineer | *Certified* Apache Spark & Neo4j
Developer
*LinkedIn*: https://www.linkedin.com/in/kaxil

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message