airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Driesprong, Fokko" <fo...@driesprong.frl>
Subject Re: API Reference - current confusion and improvement plan
Date Fri, 29 Mar 2019 11:16:07 GMT
Awesome work Kamil. Thanks for giving some love to the documentation. It
really needed some :-)

Don't forget to remove the line from the Github template: When adding new
operators/hooks/sensors, the autoclass documentation generation needs to be
added.
https://github.com/apache/airflow/blob/master/.github/PULL_REQUEST_TEMPLATE.md

Cheers, Fokko

Op wo 27 mrt. 2019 om 05:59 schreef Kamil Breguła <kamil.bregula@polidea.com
>:

> Hi.
>
> Work on this has been completed.
> New documentation is available:
> https://airflow.readthedocs.io/en/latest/_api/index.html
>
> Greetings
> Kamil Breguła
>
> On Wed, Feb 27, 2019 at 12:51 PM Kamil Breguła
> <kamil.bregula@polidea.com> wrote:
> >
> > Hi.
> >
> > Me and Jarek Potiuk have recently worked to finish these changes. As a
> result, a PR series was created:
> >
> > - [AIRFLOW-XXX][1/3] Syntax docs improvements -
> https://github.com/apache/airflow/pull/4789
> > - [AIRFLOW-3968][2/3] Refactor base GCP hook -
> https://github.com/apache/airflow/pull/4790
> > - [AIRFLOW-3811][3/3] Add automatic generation of API Reference  -
> https://github.com/apache/airflow/pull/4788
> >
> > I invite you to review. Preview is available in the description of each
> PR
> >
> > Greets,
> > Kamil Breguła
> >
> > On Wed, Feb 6, 2019 at 2:09 PM Szymon Przedwojski <
> szymon.przedwojski@polidea.com> wrote:
> >>
> >> +1
> >> I also like the new docs layout and the big win is that it’s generated
> automatically from all files and we won’t have to modify code.rst /
> integration.rst manually anymore.
> >>
> >> Szymon Przedwojski
> >> Polidea | Software Engineer
> >>
> >> M: +48 500 330 790
> >> E: szymon.przedwojski@polidea.com
> >>
> >> > On 5 Feb 2019, at 21:33, Ash Berlin-Taylor <ash@apache.org> wrote:
> >> >
> >> > I have idly wondered about something like this as a layout
> >> >
> >> >    from airflow.$something.aws.operators import EmrAddStepOperator
> >> >
> >> > - Grouping by service provider is more helpful
> >> > - Having more than one operator per module
> >> > - Not having `_operator` (etc.) suffix on the modue, and the class,
> and the module path
> >> >
> >> > Perhaps a bigger change - though to make it much less painful on our
> users we could keep the old names with a deprecation warning or two (even
> past 2.0, to say 2.1) Out of scope for current discussion.
> >> >
> >> > -ash
> >> >
> >> >> On 5 Feb 2019, at 20:22, Kamil Breguła <kamil.bregula@polidea.com>
> wrote:
> >> >>
> >> >> I think that we should group operators by service (ex. Amazon Web
> Service:
> >> >> Simple Cloud Storage). One module to one service. it will be much
> easier to
> >> >> navigate through them. A similar problem occurs with the Google Cloud
> >> >> Storage service, but we have a solution (PR:
> >> >> https://github.com/apache/airflow/pull/3000 ). A large part and
> future
> >> >> operators, which are written in accordance with the recommendations
(
> >> >>
> https://lists.apache.org/thread.html/e8534d82be611ae7bcb21ba371546a4278aad117d5e50361fd8f14fe@%3Cdev.airflow.apache.org%3E
> ),
> >> >> follow these rules.
> >> >>
> >> >> The problem will be with operators that integrate two services at
> the same
> >> >> time. I think that we can leave them in a separate module and link
> to this
> >> >> class in the description of the module.
> >> >>
> >> >> However, this is not a current problem. I just wanted to mark future
> >> >> improvements, which is possible if we introduce the proposed
> solution.
> >> >>
> >> >> On Tue, Feb 5, 2019 at 8:57 PM Ash Berlin-Taylor <ash@apache.org>
> wrote:
> >> >>
> >> >>> I like the API reference v2 layout a lot! Much easier to navigate
> and see
> >> >>> what classes are available, for me at least
> >> >>>
> >> >>> Documenting modules will help somewhat with a few things but, lets
> say the
> >> >>> "AWS" section of the integration doc is across the following
> modules:
> >> >>>
> >> >>> airflow.contrib.operators.aws_athena_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/aws_athena_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.awsbatch_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/awsbatch_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.ecs_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/ecs_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.emr_add_steps_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_add_steps_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.emr_create_job_flow_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_create_job_flow_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.emr_terminate_job_flow_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_terminate_job_flow_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.s3_copy_object_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_copy_object_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.s3_delete_objects_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_delete_objects_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.s3_list_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_list_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.s3_to_gcs_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.s3_to_gcs_transfer_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_transfer_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.s3_to_sftp_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_sftp_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.sagemaker_base_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_base_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.sagemaker_endpoint_config_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_config_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.sagemaker_endpoint_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.sagemaker_model_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_model_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.sagemaker_training_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_training_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.sagemaker_transform_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_transform_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.sagemaker_tuning_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_tuning_operator/index.html
> >> >>>>
> >> >>> airflow.contrib.operators.segment_track_event_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/segment_track_event_operator/index.html
> >> >>>>
> >> >>> airflow.operators.redshift_to_s3_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/operators/redshift_to_s3_operator/index.html
> >> >>>>
> >> >>> airflow.operators.s3_file_transform_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_file_transform_operator/index.html
> >> >>>>
> >> >>> airflow.operators.s3_to_hive_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_hive_operator/index.html
> >> >>>>
> >> >>> airflow.operators.s3_to_redshift_operator <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_redshift_operator/index.html
> >> >>>>
> >> >>> airflow.sensors.s3_key_sensor <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_key_sensor/index.html
> >> >>>>
> >> >>> airflow.sensors.s3_prefix_sensor <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_prefix_sensor/index.html
> >> >>>>
> >> >>> airflow.contrib.sensors.emr_base_sensor <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_base_sensor/index.html
> >> >>>>
> >> >>> airflow.contrib.sensors.emr_job_flow_sensor <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_job_flow_sensor/index.html
> >> >>>>
> >> >>> airflow.contrib.sensors.emr_step_sensor <
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_step_sensor/index.html
> >> >>>>
> >> >>>
> >> >>> And that was just before I got bored of looking for them :)
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>>
> >> >>>> On 5 Feb 2019, at 16:25, Kamil Breguła <kamil.bregula@polidea.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> I already have a POC: :-)
> >> >>>>
> >> >>>> Available at: http://level-can.surge.sh/html/autoapi/index.html
> >> >>>>
> >> >>>> I would like to point out that in addition to class documentation,
> you
> >> >>> can
> >> >>>> also document modules.
> >> >>>>
> >> >>>
> http://level-can.surge.sh/html/autoapi/airflow/executors/local_executor/index.html
> >> >>>> Currently, the `howto/operators.rst` file is used for this
> (Related link:
> >> >>>>
> >> >>>
> https://airflow.readthedocs.io/en/latest/howto/operator.html#cloudsqlqueryoperator
> >> >>>> )
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Feb 5, 2019 at 5:18 PM Ash Berlin-Taylor <ash@apache.org>
> wrote:
> >> >>>>
> >> >>>>>> We want to rewrite the `integration.rst` file so that
it does not
> >> >>> contain
> >> >>>>>> duplicates from `code.rst ' (API Reference). In the
next step,
> >> >>> introduce
> >> >>>>>> the reference API generation based on the source code
that will
> replace
> >> >>>>> the
> >> >>>>>> `code.rst` file.
> >> >>>>>
> >> >>>>> :100: Yes please!
> >> >>>>>
> >> >>>>>
> >> >>>>> Given a number of integrations are across multiple files
(n
> operators,
> >> >>> and
> >> >>>>> m hooks) my first thought is that something in integration.rst,
> or a
> >> >>> file
> >> >>>>> elsewhere in the docs/ tree is the place to put this.
> >> >>>>>
> >> >>>>> On epydoc vs a sphinx extension I lean very heavily towards
the
> sphinx
> >> >>>>> extension, as we are already using much of sphinx.
> >> >>>>>
> >> >>>>> Can you create a _small_ example of what you'd propse for
no.4 (I
> don't
> >> >>>>> want you to do a lot of work that might be wasted)
> >> >>>>>
> >> >>>>> -ash
> >> >>>>>
> >> >>>>>
> >> >>>>>> On 5 Feb 2019, at 15:55, Kamil Breguła <
> kamil.bregula@polidea.com>
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>> Hello community,
> >> >>>>>>
> >> >>>>>> While working on the documentation for the GCP operators,
my
> team at
> >> >>>>>> Polidea encountered some confusion related to the structure
of
> the
> >> >>>>>> documentation.
> >> >>>>>>
> >> >>>>>> Short story:
> >> >>>>>>
> >> >>>>>> We want to rewrite the `integration.rst` file so that
it does not
> >> >>> contain
> >> >>>>>> duplicates from `code.rst ' (API Reference). In the
next step,
> >> >>> introduce
> >> >>>>>> the reference API generation based on the source code
that will
> replace
> >> >>>>> the
> >> >>>>>> `code.rst` file.
> >> >>>>>>
> >> >>>>>> Long story:
> >> >>>>>>
> >> >>>>>> Currently, the documentation contains two places where
the
> description
> >> >>> of
> >> >>>>>> classes related to operators is included. They are
`code.rst` and
> >> >>>>>> `integration.rst` files.
> >> >>>>>>
> >> >>>>>> The `integration.rst` file contains information about
> integration, in
> >> >>>>>> particular for Azure: Microsoft Azure, AWS: Amazon
Web Services,
> >> >>>>>> Databricks, GCP: Google Cloud Platform, Qubole. Other
> integrations,
> >> >>>>>> however, do not have descriptions.
> >> >>>>>>
> >> >>>>>> The `code.rst` file contains “API Reference” which
contains
> information
> >> >>>>>> about *all* classes including those included in the
file
> >> >>>>> `integration.rst`.
> >> >>>>>>
> >> >>>>>> Such duplication, however, is problematic for several
reasons:
> >> >>>>>>
> >> >>>>>> 1.
> >> >>>>>>
> >> >>>>>> Users may feel lost and may not know which section
they should
> look
> >> >>>>> into.
> >> >>>>>> 2.
> >> >>>>>>
> >> >>>>>> Changes must be made in many places which leads to
> desynchronization.
> >> >>>>>> Most often, changes are made only in the source code,
so they do
> not
> >> >>>>> appear
> >> >>>>>> in the generated documentation.
> >> >>>>>> 3.
> >> >>>>>>
> >> >>>>>> Linking to classes using the `class` directive for
Sphinx is
> >> >>>>>> inconclusive - if the code is embedded both in `integration.rst`
> and
> >> >>>>>> `code.rst` using the `autoclass` directive, we’re
not sure where
> the
> >> >>>>> user
> >> >>>>>> will be navigated.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> There are several solutions::
> >> >>>>>>
> >> >>>>>> 1.
> >> >>>>>>
> >> >>>>>> Leave it as is. Then we need to agree on which `autoclass`
> directive
> >> >>>>>> should have the `no-index` flags.
> >> >>>>>> 2.
> >> >>>>>>
> >> >>>>>> Delete duplicates from the `code.rst` file and add
a note about
> the
> >> >>>>>> `integration.rst` file in the `code.rst` file.
> >> >>>>>> 3.
> >> >>>>>>
> >> >>>>>> Delete duplicates from the `integration.rst` file and
add a note
> about
> >> >>>>>> the `code.rst` file in the `integration.rst` file.
> >> >>>>>> 4.
> >> >>>>>>
> >> >>>>>> Delete information from both files and generate the
API
> documentation
> >> >>>>>> always based only on the source code. This solution
means that we
> >> >>> would
> >> >>>>>> have to write less documentation.
> >> >>>>>> There are ready tools that we can use:
> >> >>>>>> 1.
> >> >>>>>>
> >> >>>>>>   epydoc - http://epydoc.sourceforge.net/ ;
> >> >>>>>>   2.
> >> >>>>>>
> >> >>>>>>   autoapi extension for Sphinx -
> >> >>>>> https://github.com/rtfd/sphinx-autoapi
> >> >>>>>>   ;
> >> >>>>>>   3.
> >> >>>>>>
> >> >>>>>>   other - https://wiki.python.org/moin/DocumentationTools
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> The first, second, third solution does not solve all
problems. In
> >> >>>>>> particular, we still need to complete the `code.rst`
and
> >> >>>>> `integration.rst`
> >> >>>>>> files. The fourth solution solves all problems, but
is the most
> >> >>> complex.
> >> >>>>> It
> >> >>>>>> is worth noting that mixing solutions is possible.
For example,
> we can
> >> >>>>>> delete information from the file `integration.rst`
as short term
> >> >>> solution
> >> >>>>>> and start working on creating another form of documentation
as a
> long
> >> >>>>> term
> >> >>>>>> solution. This is the best option in our opinion.
> >> >>>>>>
> >> >>>>>> I’ve recently done a few activities related to this
topic.
> >> >>>>>>
> >> >>>>>> First, I added the noindex flag to autoclass directives
for all
> >> >>> operators
> >> >>>>>> in `integration.rst` file. In rare cases (If any),
this caused
> links
> >> >>> that
> >> >>>>>> were previously directed to the file `integration.rst`
to be
> redirected
> >> >>>>> to
> >> >>>>>> the `code.rst` file. Elements had to be linked using
`:class:`
> instead
> >> >>>>> of a
> >> >>>>>> section link. Each operator is included in the new
section in
> this
> >> >>> file.
> >> >>>>>>
> >> >>>>>> PR: https://github.com/apache/airflow/pull/4585
> >> >>>>>> <https://github.com/apache/airflow/pull/4585/files>
> >> >>>>>>
> >> >>>>>> Second, I completed the `code.rst` file with the missing
classes.
> >> >>>>>>
> >> >>>>>> PR: https://github.com/apache/airflow/pull/4644
> >> >>>>>>
> >> >>>>>> I would like to ask which solution is the best in your
opinion?
> What
> >> >>>>> steps
> >> >>>>>> should we take to make the documentation more enjoyable?
> >> >>>>>>
> >> >>>>>> Greetings
> >> >>>>>>
> >> >>>>>> Kamil Breguła
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Kamil Breguła
> >> >>>> Polidea <https://www.polidea.com/> | Software Engineer
> >> >>>>
> >> >>>> M: +48 505 458 451 <+48505458451>
> >> >>>> E: kamil.bregula@polidea.com
> >> >>>> [image: Polidea] <https://www.polidea.com/>
> >> >>>>
> >> >>>> We create human & business stories through technology.
> >> >>>> Check out our projects! <https://www.polidea.com/our-work>
> >> >>>> [image: Github] <https://github.com/Polidea> [image:
Facebook]
> >> >>>> <https://www.facebook.com/Polidea.Software> [image: Twitter]
> >> >>>> <https://twitter.com/polidea> [image: Linkedin]
> >> >>>> <https://www.linkedin.com/company/polidea> [image: Instagram]
> >> >>>> <https://instagram.com/polidea> [image: Behance]
> >> >>>> <https://www.behance.net/polidea>
> >> >>>
> >> >>>
> >> >>
> >> >> --
> >> >>
> >> >> Kamil Breguła
> >> >> Polidea <https://www.polidea.com/> | Software Engineer
> >> >>
> >> >> M: +48 505 458 451 <+48505458451>
> >> >> E: kamil.bregula@polidea.com
> >> >> [image: Polidea] <https://www.polidea.com/>
> >> >>
> >> >> We create human & business stories through technology.
> >> >> Check out our projects! <https://www.polidea.com/our-work>
> >> >> [image: Github] <https://github.com/Polidea> [image: Facebook]
> >> >> <https://www.facebook.com/Polidea.Software> [image: Twitter]
> >> >> <https://twitter.com/polidea> [image: Linkedin]
> >> >> <https://www.linkedin.com/company/polidea> [image: Instagram]
> >> >> <https://instagram.com/polidea> [image: Behance]
> >> >> <https://www.behance.net/polidea>
> >> >
> >>
> >
> >
> > --
> >
> > Kamil Breguła
> > Polidea | Software Engineer
> >
> > M: +48 505 458 451
> > E: kamil.bregula@polidea.com
> >
> > We create human & business stories through technology.
> > Check out our projects!
>
>
>
> --
>
> Kamil Breguła
> Polidea | Software Engineer
>
> M: +48 505 458 451
> E: kamil.bregula@polidea.com
>
> We create human & business stories through technology.
> Check out our projects!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message