airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szymon Przedwojski <szymon.przedwoj...@polidea.com>
Subject Re: API Reference - current confusion and improvement plan
Date Wed, 06 Feb 2019 13:01:34 GMT
+1 
I also like the new docs layout and the big win is that it’s generated automatically from
all files and we won’t have to modify code.rst / integration.rst manually anymore.

Szymon Przedwojski
Polidea | Software Engineer

M: +48 500 330 790
E: szymon.przedwojski@polidea.com 

> On 5 Feb 2019, at 21:33, Ash Berlin-Taylor <ash@apache.org> wrote:
> 
> I have idly wondered about something like this as a layout
> 
>    from airflow.$something.aws.operators import EmrAddStepOperator
> 
> - Grouping by service provider is more helpful
> - Having more than one operator per module
> - Not having `_operator` (etc.) suffix on the modue, and the class, and the module path
> 
> Perhaps a bigger change - though to make it much less painful on our users we could keep
the old names with a deprecation warning or two (even past 2.0, to say 2.1) Out of scope for
current discussion.
> 
> -ash
> 
>> On 5 Feb 2019, at 20:22, Kamil Breguła <kamil.bregula@polidea.com> wrote:
>> 
>> I think that we should group operators by service (ex. Amazon Web Service:
>> Simple Cloud Storage). One module to one service. it will be much easier to
>> navigate through them. A similar problem occurs with the Google Cloud
>> Storage service, but we have a solution (PR:
>> https://github.com/apache/airflow/pull/3000 ). A large part and future
>> operators, which are written in accordance with the recommendations (
>> https://lists.apache.org/thread.html/e8534d82be611ae7bcb21ba371546a4278aad117d5e50361fd8f14fe@%3Cdev.airflow.apache.org%3E),
>> follow these rules.
>> 
>> The problem will be with operators that integrate two services at the same
>> time. I think that we can leave them in a separate module and link to this
>> class in the description of the module.
>> 
>> However, this is not a current problem. I just wanted to mark future
>> improvements, which is possible if we introduce the proposed solution.
>> 
>> On Tue, Feb 5, 2019 at 8:57 PM Ash Berlin-Taylor <ash@apache.org> wrote:
>> 
>>> I like the API reference v2 layout a lot! Much easier to navigate and see
>>> what classes are available, for me at least
>>> 
>>> Documenting modules will help somewhat with a few things but, lets say the
>>> "AWS" section of the integration doc is across the following modules:
>>> 
>>> airflow.contrib.operators.aws_athena_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/aws_athena_operator/index.html
>>>> 
>>> airflow.contrib.operators.awsbatch_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/awsbatch_operator/index.html
>>>> 
>>> airflow.contrib.operators.ecs_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/ecs_operator/index.html
>>>> 
>>> airflow.contrib.operators.emr_add_steps_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_add_steps_operator/index.html
>>>> 
>>> airflow.contrib.operators.emr_create_job_flow_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_create_job_flow_operator/index.html
>>>> 
>>> airflow.contrib.operators.emr_terminate_job_flow_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_terminate_job_flow_operator/index.html
>>>> 
>>> airflow.contrib.operators.s3_copy_object_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_copy_object_operator/index.html
>>>> 
>>> airflow.contrib.operators.s3_delete_objects_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_delete_objects_operator/index.html
>>>> 
>>> airflow.contrib.operators.s3_list_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_list_operator/index.html
>>>> 
>>> airflow.contrib.operators.s3_to_gcs_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_operator/index.html
>>>> 
>>> airflow.contrib.operators.s3_to_gcs_transfer_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_transfer_operator/index.html
>>>> 
>>> airflow.contrib.operators.s3_to_sftp_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_sftp_operator/index.html
>>>> 
>>> airflow.contrib.operators.sagemaker_base_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_base_operator/index.html
>>>> 
>>> airflow.contrib.operators.sagemaker_endpoint_config_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_config_operator/index.html
>>>> 
>>> airflow.contrib.operators.sagemaker_endpoint_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_operator/index.html
>>>> 
>>> airflow.contrib.operators.sagemaker_model_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_model_operator/index.html
>>>> 
>>> airflow.contrib.operators.sagemaker_training_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_training_operator/index.html
>>>> 
>>> airflow.contrib.operators.sagemaker_transform_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_transform_operator/index.html
>>>> 
>>> airflow.contrib.operators.sagemaker_tuning_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_tuning_operator/index.html
>>>> 
>>> airflow.contrib.operators.segment_track_event_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/segment_track_event_operator/index.html
>>>> 
>>> airflow.operators.redshift_to_s3_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/operators/redshift_to_s3_operator/index.html
>>>> 
>>> airflow.operators.s3_file_transform_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_file_transform_operator/index.html
>>>> 
>>> airflow.operators.s3_to_hive_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_hive_operator/index.html
>>>> 
>>> airflow.operators.s3_to_redshift_operator <
>>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_redshift_operator/index.html
>>>> 
>>> airflow.sensors.s3_key_sensor <
>>> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_key_sensor/index.html
>>>> 
>>> airflow.sensors.s3_prefix_sensor <
>>> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_prefix_sensor/index.html
>>>> 
>>> airflow.contrib.sensors.emr_base_sensor <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_base_sensor/index.html
>>>> 
>>> airflow.contrib.sensors.emr_job_flow_sensor <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_job_flow_sensor/index.html
>>>> 
>>> airflow.contrib.sensors.emr_step_sensor <
>>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_step_sensor/index.html
>>>> 
>>> 
>>> And that was just before I got bored of looking for them :)
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> On 5 Feb 2019, at 16:25, Kamil Breguła <kamil.bregula@polidea.com>
>>> wrote:
>>>> 
>>>> I already have a POC: :-)
>>>> 
>>>> Available at: http://level-can.surge.sh/html/autoapi/index.html
>>>> 
>>>> I would like to point out that in addition to class documentation, you
>>> can
>>>> also document modules.
>>>> 
>>> http://level-can.surge.sh/html/autoapi/airflow/executors/local_executor/index.html
>>>> Currently, the `howto/operators.rst` file is used for this (Related link:
>>>> 
>>> https://airflow.readthedocs.io/en/latest/howto/operator.html#cloudsqlqueryoperator
>>>> )
>>>> 
>>>> 
>>>> On Tue, Feb 5, 2019 at 5:18 PM Ash Berlin-Taylor <ash@apache.org> wrote:
>>>> 
>>>>>> We want to rewrite the `integration.rst` file so that it does not
>>> contain
>>>>>> duplicates from `code.rst ' (API Reference). In the next step,
>>> introduce
>>>>>> the reference API generation based on the source code that will replace
>>>>> the
>>>>>> `code.rst` file.
>>>>> 
>>>>> :100: Yes please!
>>>>> 
>>>>> 
>>>>> Given a number of integrations are across multiple files (n operators,
>>> and
>>>>> m hooks) my first thought is that something in integration.rst, or a
>>> file
>>>>> elsewhere in the docs/ tree is the place to put this.
>>>>> 
>>>>> On epydoc vs a sphinx extension I lean very heavily towards the sphinx
>>>>> extension, as we are already using much of sphinx.
>>>>> 
>>>>> Can you create a _small_ example of what you'd propse for no.4 (I don't
>>>>> want you to do a lot of work that might be wasted)
>>>>> 
>>>>> -ash
>>>>> 
>>>>> 
>>>>>> On 5 Feb 2019, at 15:55, Kamil Breguła <kamil.bregula@polidea.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hello community,
>>>>>> 
>>>>>> While working on the documentation for the GCP operators, my team
at
>>>>>> Polidea encountered some confusion related to the structure of the
>>>>>> documentation.
>>>>>> 
>>>>>> Short story:
>>>>>> 
>>>>>> We want to rewrite the `integration.rst` file so that it does not
>>> contain
>>>>>> duplicates from `code.rst ' (API Reference). In the next step,
>>> introduce
>>>>>> the reference API generation based on the source code that will replace
>>>>> the
>>>>>> `code.rst` file.
>>>>>> 
>>>>>> Long story:
>>>>>> 
>>>>>> Currently, the documentation contains two places where the description
>>> of
>>>>>> classes related to operators is included. They are `code.rst` and
>>>>>> `integration.rst` files.
>>>>>> 
>>>>>> The `integration.rst` file contains information about integration,
in
>>>>>> particular for Azure: Microsoft Azure, AWS: Amazon Web Services,
>>>>>> Databricks, GCP: Google Cloud Platform, Qubole. Other integrations,
>>>>>> however, do not have descriptions.
>>>>>> 
>>>>>> The `code.rst` file contains “API Reference” which contains information
>>>>>> about *all* classes including those included in the file
>>>>> `integration.rst`.
>>>>>> 
>>>>>> Such duplication, however, is problematic for several reasons:
>>>>>> 
>>>>>> 1.
>>>>>> 
>>>>>> Users may feel lost and may not know which section they should look
>>>>> into.
>>>>>> 2.
>>>>>> 
>>>>>> Changes must be made in many places which leads to desynchronization.
>>>>>> Most often, changes are made only in the source code, so they do
not
>>>>> appear
>>>>>> in the generated documentation.
>>>>>> 3.
>>>>>> 
>>>>>> Linking to classes using the `class` directive for Sphinx is
>>>>>> inconclusive - if the code is embedded both in `integration.rst`
and
>>>>>> `code.rst` using the `autoclass` directive, we’re not sure where
the
>>>>> user
>>>>>> will be navigated.
>>>>>> 
>>>>>> 
>>>>>> There are several solutions::
>>>>>> 
>>>>>> 1.
>>>>>> 
>>>>>> Leave it as is. Then we need to agree on which `autoclass` directive
>>>>>> should have the `no-index` flags.
>>>>>> 2.
>>>>>> 
>>>>>> Delete duplicates from the `code.rst` file and add a note about the
>>>>>> `integration.rst` file in the `code.rst` file.
>>>>>> 3.
>>>>>> 
>>>>>> Delete duplicates from the `integration.rst` file and add a note
about
>>>>>> the `code.rst` file in the `integration.rst` file.
>>>>>> 4.
>>>>>> 
>>>>>> Delete information from both files and generate the API documentation
>>>>>> always based only on the source code. This solution means that we
>>> would
>>>>>> have to write less documentation.
>>>>>> There are ready tools that we can use:
>>>>>> 1.
>>>>>> 
>>>>>>   epydoc - http://epydoc.sourceforge.net/ ;
>>>>>>   2.
>>>>>> 
>>>>>>   autoapi extension for Sphinx -
>>>>> https://github.com/rtfd/sphinx-autoapi
>>>>>>   ;
>>>>>>   3.
>>>>>> 
>>>>>>   other - https://wiki.python.org/moin/DocumentationTools
>>>>>> 
>>>>>> 
>>>>>> The first, second, third solution does not solve all problems. In
>>>>>> particular, we still need to complete the `code.rst` and
>>>>> `integration.rst`
>>>>>> files. The fourth solution solves all problems, but is the most
>>> complex.
>>>>> It
>>>>>> is worth noting that mixing solutions is possible. For example, we
can
>>>>>> delete information from the file `integration.rst` as short term
>>> solution
>>>>>> and start working on creating another form of documentation as a
long
>>>>> term
>>>>>> solution. This is the best option in our opinion.
>>>>>> 
>>>>>> I’ve recently done a few activities related to this topic.
>>>>>> 
>>>>>> First, I added the noindex flag to autoclass directives for all
>>> operators
>>>>>> in `integration.rst` file. In rare cases (If any), this caused links
>>> that
>>>>>> were previously directed to the file `integration.rst` to be redirected
>>>>> to
>>>>>> the `code.rst` file. Elements had to be linked using `:class:` instead
>>>>> of a
>>>>>> section link. Each operator is included in the new section in this
>>> file.
>>>>>> 
>>>>>> PR: https://github.com/apache/airflow/pull/4585
>>>>>> <https://github.com/apache/airflow/pull/4585/files>
>>>>>> 
>>>>>> Second, I completed the `code.rst` file with the missing classes.
>>>>>> 
>>>>>> PR: https://github.com/apache/airflow/pull/4644
>>>>>> 
>>>>>> I would like to ask which solution is the best in your opinion? What
>>>>> steps
>>>>>> should we take to make the documentation more enjoyable?
>>>>>> 
>>>>>> Greetings
>>>>>> 
>>>>>> Kamil Breguła
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> 
>>>> Kamil Breguła
>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>> 
>>>> M: +48 505 458 451 <+48505458451>
>>>> E: kamil.bregula@polidea.com
>>>> [image: Polidea] <https://www.polidea.com/>
>>>> 
>>>> We create human & business stories through technology.
>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>> [image: Github] <https://github.com/Polidea> [image: Facebook]
>>>> <https://www.facebook.com/Polidea.Software> [image: Twitter]
>>>> <https://twitter.com/polidea> [image: Linkedin]
>>>> <https://www.linkedin.com/company/polidea> [image: Instagram]
>>>> <https://instagram.com/polidea> [image: Behance]
>>>> <https://www.behance.net/polidea>
>>> 
>>> 
>> 
>> -- 
>> 
>> Kamil Breguła
>> Polidea <https://www.polidea.com/> | Software Engineer
>> 
>> M: +48 505 458 451 <+48505458451>
>> E: kamil.bregula@polidea.com
>> [image: Polidea] <https://www.polidea.com/>
>> 
>> We create human & business stories through technology.
>> Check out our projects! <https://www.polidea.com/our-work>
>> [image: Github] <https://github.com/Polidea> [image: Facebook]
>> <https://www.facebook.com/Polidea.Software> [image: Twitter]
>> <https://twitter.com/polidea> [image: Linkedin]
>> <https://www.linkedin.com/company/polidea> [image: Instagram]
>> <https://instagram.com/polidea> [image: Behance]
>> <https://www.behance.net/polidea>
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message