airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: API Reference - current confusion and improvement plan
Date Tue, 05 Feb 2019 19:57:49 GMT
I like the API reference v2 layout a lot! Much easier to navigate and see what classes are
available, for me at least

Documenting modules will help somewhat with a few things but, lets say the "AWS" section of
the integration doc is across the following modules:

airflow.contrib.operators.aws_athena_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/aws_athena_operator/index.html>
airflow.contrib.operators.awsbatch_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/awsbatch_operator/index.html>
airflow.contrib.operators.ecs_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/ecs_operator/index.html>
airflow.contrib.operators.emr_add_steps_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_add_steps_operator/index.html>
airflow.contrib.operators.emr_create_job_flow_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_create_job_flow_operator/index.html>
airflow.contrib.operators.emr_terminate_job_flow_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_terminate_job_flow_operator/index.html>
airflow.contrib.operators.s3_copy_object_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_copy_object_operator/index.html>
airflow.contrib.operators.s3_delete_objects_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_delete_objects_operator/index.html>
airflow.contrib.operators.s3_list_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_list_operator/index.html>
airflow.contrib.operators.s3_to_gcs_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_operator/index.html>
airflow.contrib.operators.s3_to_gcs_transfer_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_transfer_operator/index.html>
airflow.contrib.operators.s3_to_sftp_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_sftp_operator/index.html>
airflow.contrib.operators.sagemaker_base_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_base_operator/index.html>
airflow.contrib.operators.sagemaker_endpoint_config_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_config_operator/index.html>
airflow.contrib.operators.sagemaker_endpoint_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_operator/index.html>
airflow.contrib.operators.sagemaker_model_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_model_operator/index.html>
airflow.contrib.operators.sagemaker_training_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_training_operator/index.html>
airflow.contrib.operators.sagemaker_transform_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_transform_operator/index.html>
airflow.contrib.operators.sagemaker_tuning_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_tuning_operator/index.html>
airflow.contrib.operators.segment_track_event_operator <http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/segment_track_event_operator/index.html>
airflow.operators.redshift_to_s3_operator <http://level-can.surge.sh/html/autoapi/airflow/operators/redshift_to_s3_operator/index.html>
airflow.operators.s3_file_transform_operator <http://level-can.surge.sh/html/autoapi/airflow/operators/s3_file_transform_operator/index.html>
airflow.operators.s3_to_hive_operator <http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_hive_operator/index.html>
airflow.operators.s3_to_redshift_operator <http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_redshift_operator/index.html>
airflow.sensors.s3_key_sensor <http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_key_sensor/index.html>
airflow.sensors.s3_prefix_sensor <http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_prefix_sensor/index.html>
airflow.contrib.sensors.emr_base_sensor <http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_base_sensor/index.html>
airflow.contrib.sensors.emr_job_flow_sensor <http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_job_flow_sensor/index.html>
airflow.contrib.sensors.emr_step_sensor <http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_step_sensor/index.html>

And that was just before I got bored of looking for them :)




> 
> On 5 Feb 2019, at 16:25, Kamil Breguła <kamil.bregula@polidea.com> wrote:
> 
> I already have a POC: :-)
> 
> Available at: http://level-can.surge.sh/html/autoapi/index.html
> 
> I would like to point out that in addition to class documentation, you can
> also document modules.
> http://level-can.surge.sh/html/autoapi/airflow/executors/local_executor/index.html
> Currently, the `howto/operators.rst` file is used for this (Related link:
> https://airflow.readthedocs.io/en/latest/howto/operator.html#cloudsqlqueryoperator
> )
> 
> 
> On Tue, Feb 5, 2019 at 5:18 PM Ash Berlin-Taylor <ash@apache.org> wrote:
> 
>>> We want to rewrite the `integration.rst` file so that it does not contain
>>> duplicates from `code.rst ' (API Reference). In the next step, introduce
>>> the reference API generation based on the source code that will replace
>> the
>>> `code.rst` file.
>> 
>> :100: Yes please!
>> 
>> 
>> Given a number of integrations are across multiple files (n operators, and
>> m hooks) my first thought is that something in integration.rst, or a file
>> elsewhere in the docs/ tree is the place to put this.
>> 
>> On epydoc vs a sphinx extension I lean very heavily towards the sphinx
>> extension, as we are already using much of sphinx.
>> 
>> Can you create a _small_ example of what you'd propse for no.4 (I don't
>> want you to do a lot of work that might be wasted)
>> 
>> -ash
>> 
>> 
>>> On 5 Feb 2019, at 15:55, Kamil Breguła <kamil.bregula@polidea.com>
>> wrote:
>>> 
>>> Hello community,
>>> 
>>> While working on the documentation for the GCP operators, my team at
>>> Polidea encountered some confusion related to the structure of the
>>> documentation.
>>> 
>>> Short story:
>>> 
>>> We want to rewrite the `integration.rst` file so that it does not contain
>>> duplicates from `code.rst ' (API Reference). In the next step, introduce
>>> the reference API generation based on the source code that will replace
>> the
>>> `code.rst` file.
>>> 
>>> Long story:
>>> 
>>> Currently, the documentation contains two places where the description of
>>> classes related to operators is included. They are `code.rst` and
>>> `integration.rst` files.
>>> 
>>> The `integration.rst` file contains information about integration, in
>>> particular for Azure: Microsoft Azure, AWS: Amazon Web Services,
>>> Databricks, GCP: Google Cloud Platform, Qubole. Other integrations,
>>> however, do not have descriptions.
>>> 
>>> The `code.rst` file contains “API Reference” which contains information
>>> about *all* classes including those included in the file
>> `integration.rst`.
>>> 
>>> Such duplication, however, is problematic for several reasons:
>>> 
>>>  1.
>>> 
>>>  Users may feel lost and may not know which section they should look
>> into.
>>>  2.
>>> 
>>>  Changes must be made in many places which leads to desynchronization.
>>>  Most often, changes are made only in the source code, so they do not
>> appear
>>>  in the generated documentation.
>>>  3.
>>> 
>>>  Linking to classes using the `class` directive for Sphinx is
>>>  inconclusive - if the code is embedded both in `integration.rst` and
>>>  `code.rst` using the `autoclass` directive, we’re not sure where the
>> user
>>>  will be navigated.
>>> 
>>> 
>>> There are several solutions::
>>> 
>>>  1.
>>> 
>>>  Leave it as is. Then we need to agree on which `autoclass` directive
>>>  should have the `no-index` flags.
>>>  2.
>>> 
>>>  Delete duplicates from the `code.rst` file and add a note about the
>>>  `integration.rst` file in the `code.rst` file.
>>>  3.
>>> 
>>>  Delete duplicates from the `integration.rst` file and add a note about
>>>  the `code.rst` file in the `integration.rst` file.
>>>  4.
>>> 
>>>  Delete information from both files and generate the API documentation
>>>  always based only on the source code. This solution means that we would
>>>  have to write less documentation.
>>>  There are ready tools that we can use:
>>>  1.
>>> 
>>>     epydoc - http://epydoc.sourceforge.net/ ;
>>>     2.
>>> 
>>>     autoapi extension for Sphinx -
>> https://github.com/rtfd/sphinx-autoapi
>>>     ;
>>>     3.
>>> 
>>>     other - https://wiki.python.org/moin/DocumentationTools
>>> 
>>> 
>>> The first, second, third solution does not solve all problems. In
>>> particular, we still need to complete the `code.rst` and
>> `integration.rst`
>>> files. The fourth solution solves all problems, but is the most complex.
>> It
>>> is worth noting that mixing solutions is possible. For example, we can
>>> delete information from the file `integration.rst` as short term solution
>>> and start working on creating another form of documentation as a long
>> term
>>> solution. This is the best option in our opinion.
>>> 
>>> I’ve recently done a few activities related to this topic.
>>> 
>>> First, I added the noindex flag to autoclass directives for all operators
>>> in `integration.rst` file. In rare cases (If any), this caused links that
>>> were previously directed to the file `integration.rst` to be redirected
>> to
>>> the `code.rst` file. Elements had to be linked using `:class:` instead
>> of a
>>> section link. Each operator is included in the new section in this file.
>>> 
>>> PR: https://github.com/apache/airflow/pull/4585
>>> <https://github.com/apache/airflow/pull/4585/files>
>>> 
>>> Second, I completed the `code.rst` file with the missing classes.
>>> 
>>> PR: https://github.com/apache/airflow/pull/4644
>>> 
>>> I would like to ask which solution is the best in your opinion? What
>> steps
>>> should we take to make the documentation more enjoyable?
>>> 
>>> Greetings
>>> 
>>> Kamil Breguła
>> 
>> 
> 
> -- 
> 
> Kamil Breguła
> Polidea <https://www.polidea.com/> | Software Engineer
> 
> M: +48 505 458 451 <+48505458451>
> E: kamil.bregula@polidea.com
> [image: Polidea] <https://www.polidea.com/>
> 
> We create human & business stories through technology.
> Check out our projects! <https://www.polidea.com/our-work>
> [image: Github] <https://github.com/Polidea> [image: Facebook]
> <https://www.facebook.com/Polidea.Software> [image: Twitter]
> <https://twitter.com/polidea> [image: Linkedin]
> <https://www.linkedin.com/company/polidea> [image: Instagram]
> <https://instagram.com/polidea> [image: Behance]
> <https://www.behance.net/polidea>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message