From dev-return-7587-archive-asf-public=cust-asf.ponee.io@airflow.apache.org Tue Feb 5 20:33:44 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 161A8180608 for ; Tue, 5 Feb 2019 21:33:42 +0100 (CET) Received: (qmail 16287 invoked by uid 500); 5 Feb 2019 20:33:41 -0000 Mailing-List: contact dev-help@airflow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.apache.org Delivered-To: mailing list dev@airflow.apache.org Received: (qmail 16276 invoked by uid 99); 5 Feb 2019 20:33:41 -0000 Received: from mail-relay.apache.org (HELO mailrelay2-lw-us.apache.org) (207.244.88.137) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2019 20:33:41 +0000 Received: from themisto.localdomain (231.25.169.217.in-addr.arpa [217.169.25.231]) by mailrelay2-lw-us.apache.org (ASF Mail Server at mailrelay2-lw-us.apache.org) with ESMTPSA id 9E7356BBA for ; Tue, 5 Feb 2019 20:33:40 +0000 (UTC) From: Ash Berlin-Taylor Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: API Reference - current confusion and improvement plan Date: Tue, 5 Feb 2019 20:33:38 +0000 References: <96B598EB-6278-4DD6-8CE3-9571EA384FDF@apache.org> To: dev@airflow.apache.org In-Reply-To: Message-Id: <29EBA31E-E16E-4332-8452-C057D0A5B808@apache.org> X-Mailer: Apple Mail (2.3273) I have idly wondered about something like this as a layout from airflow.$something.aws.operators import EmrAddStepOperator - Grouping by service provider is more helpful - Having more than one operator per module - Not having `_operator` (etc.) suffix on the modue, and the class, and = the module path Perhaps a bigger change - though to make it much less painful on our = users we could keep the old names with a deprecation warning or two = (even past 2.0, to say 2.1) Out of scope for current discussion. -ash > On 5 Feb 2019, at 20:22, Kamil Bregu=C5=82a = wrote: >=20 > I think that we should group operators by service (ex. Amazon Web = Service: > Simple Cloud Storage). One module to one service. it will be much = easier to > navigate through them. A similar problem occurs with the Google Cloud > Storage service, but we have a solution (PR: > https://github.com/apache/airflow/pull/3000 ). A large part and future > operators, which are written in accordance with the recommendations ( > = https://lists.apache.org/thread.html/e8534d82be611ae7bcb21ba371546a4278aad= 117d5e50361fd8f14fe@%3Cdev.airflow.apache.org%3E), > follow these rules. >=20 > The problem will be with operators that integrate two services at the = same > time. I think that we can leave them in a separate module and link to = this > class in the description of the module. >=20 > However, this is not a current problem. I just wanted to mark future > improvements, which is possible if we introduce the proposed solution. >=20 > On Tue, Feb 5, 2019 at 8:57 PM Ash Berlin-Taylor = wrote: >=20 >> I like the API reference v2 layout a lot! Much easier to navigate and = see >> what classes are available, for me at least >>=20 >> Documenting modules will help somewhat with a few things but, lets = say the >> "AWS" section of the integration doc is across the following modules: >>=20 >> airflow.contrib.operators.aws_athena_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/aws_athen= a_operator/index.html >>>=20 >> airflow.contrib.operators.awsbatch_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/awsbatch_= operator/index.html >>>=20 >> airflow.contrib.operators.ecs_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/ecs_opera= tor/index.html >>>=20 >> airflow.contrib.operators.emr_add_steps_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_add_s= teps_operator/index.html >>>=20 >> airflow.contrib.operators.emr_create_job_flow_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_creat= e_job_flow_operator/index.html >>>=20 >> airflow.contrib.operators.emr_terminate_job_flow_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_termi= nate_job_flow_operator/index.html >>>=20 >> airflow.contrib.operators.s3_copy_object_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_copy_o= bject_operator/index.html >>>=20 >> airflow.contrib.operators.s3_delete_objects_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_delete= _objects_operator/index.html >>>=20 >> airflow.contrib.operators.s3_list_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_list_o= perator/index.html >>>=20 >> airflow.contrib.operators.s3_to_gcs_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs= _operator/index.html >>>=20 >> airflow.contrib.operators.s3_to_gcs_transfer_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs= _transfer_operator/index.html >>>=20 >> airflow.contrib.operators.s3_to_sftp_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_sft= p_operator/index.html >>>=20 >> airflow.contrib.operators.sagemaker_base_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker= _base_operator/index.html >>>=20 >> airflow.contrib.operators.sagemaker_endpoint_config_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker= _endpoint_config_operator/index.html >>>=20 >> airflow.contrib.operators.sagemaker_endpoint_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker= _endpoint_operator/index.html >>>=20 >> airflow.contrib.operators.sagemaker_model_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker= _model_operator/index.html >>>=20 >> airflow.contrib.operators.sagemaker_training_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker= _training_operator/index.html >>>=20 >> airflow.contrib.operators.sagemaker_transform_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker= _transform_operator/index.html >>>=20 >> airflow.contrib.operators.sagemaker_tuning_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker= _tuning_operator/index.html >>>=20 >> airflow.contrib.operators.segment_track_event_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/segment_t= rack_event_operator/index.html >>>=20 >> airflow.operators.redshift_to_s3_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/operators/redshift_to_s3_op= erator/index.html >>>=20 >> airflow.operators.s3_file_transform_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/operators/s3_file_transform= _operator/index.html >>>=20 >> airflow.operators.s3_to_hive_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_hive_operat= or/index.html >>>=20 >> airflow.operators.s3_to_redshift_operator < >> = http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_redshift_op= erator/index.html >>>=20 >> airflow.sensors.s3_key_sensor < >> = http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_key_sensor/index= .html >>>=20 >> airflow.sensors.s3_prefix_sensor < >> = http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_prefix_sensor/in= dex.html >>>=20 >> airflow.contrib.sensors.emr_base_sensor < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_base_se= nsor/index.html >>>=20 >> airflow.contrib.sensors.emr_job_flow_sensor < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_job_flo= w_sensor/index.html >>>=20 >> airflow.contrib.sensors.emr_step_sensor < >> = http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_step_se= nsor/index.html >>>=20 >>=20 >> And that was just before I got bored of looking for them :) >>=20 >>=20 >>=20 >>=20 >>>=20 >>> On 5 Feb 2019, at 16:25, Kamil Bregu=C5=82a = >> wrote: >>>=20 >>> I already have a POC: :-) >>>=20 >>> Available at: http://level-can.surge.sh/html/autoapi/index.html >>>=20 >>> I would like to point out that in addition to class documentation, = you >> can >>> also document modules. >>>=20 >> = http://level-can.surge.sh/html/autoapi/airflow/executors/local_executor/in= dex.html >>> Currently, the `howto/operators.rst` file is used for this (Related = link: >>>=20 >> = https://airflow.readthedocs.io/en/latest/howto/operator.html#cloudsqlquery= operator >>> ) >>>=20 >>>=20 >>> On Tue, Feb 5, 2019 at 5:18 PM Ash Berlin-Taylor = wrote: >>>=20 >>>>> We want to rewrite the `integration.rst` file so that it does not >> contain >>>>> duplicates from `code.rst ' (API Reference). In the next step, >> introduce >>>>> the reference API generation based on the source code that will = replace >>>> the >>>>> `code.rst` file. >>>>=20 >>>> :100: Yes please! >>>>=20 >>>>=20 >>>> Given a number of integrations are across multiple files (n = operators, >> and >>>> m hooks) my first thought is that something in integration.rst, or = a >> file >>>> elsewhere in the docs/ tree is the place to put this. >>>>=20 >>>> On epydoc vs a sphinx extension I lean very heavily towards the = sphinx >>>> extension, as we are already using much of sphinx. >>>>=20 >>>> Can you create a _small_ example of what you'd propse for no.4 (I = don't >>>> want you to do a lot of work that might be wasted) >>>>=20 >>>> -ash >>>>=20 >>>>=20 >>>>> On 5 Feb 2019, at 15:55, Kamil Bregu=C5=82a = >>>> wrote: >>>>>=20 >>>>> Hello community, >>>>>=20 >>>>> While working on the documentation for the GCP operators, my team = at >>>>> Polidea encountered some confusion related to the structure of the >>>>> documentation. >>>>>=20 >>>>> Short story: >>>>>=20 >>>>> We want to rewrite the `integration.rst` file so that it does not >> contain >>>>> duplicates from `code.rst ' (API Reference). In the next step, >> introduce >>>>> the reference API generation based on the source code that will = replace >>>> the >>>>> `code.rst` file. >>>>>=20 >>>>> Long story: >>>>>=20 >>>>> Currently, the documentation contains two places where the = description >> of >>>>> classes related to operators is included. They are `code.rst` and >>>>> `integration.rst` files. >>>>>=20 >>>>> The `integration.rst` file contains information about integration, = in >>>>> particular for Azure: Microsoft Azure, AWS: Amazon Web Services, >>>>> Databricks, GCP: Google Cloud Platform, Qubole. Other = integrations, >>>>> however, do not have descriptions. >>>>>=20 >>>>> The `code.rst` file contains =E2=80=9CAPI Reference=E2=80=9D which = contains information >>>>> about *all* classes including those included in the file >>>> `integration.rst`. >>>>>=20 >>>>> Such duplication, however, is problematic for several reasons: >>>>>=20 >>>>> 1. >>>>>=20 >>>>> Users may feel lost and may not know which section they should = look >>>> into. >>>>> 2. >>>>>=20 >>>>> Changes must be made in many places which leads to = desynchronization. >>>>> Most often, changes are made only in the source code, so they do = not >>>> appear >>>>> in the generated documentation. >>>>> 3. >>>>>=20 >>>>> Linking to classes using the `class` directive for Sphinx is >>>>> inconclusive - if the code is embedded both in `integration.rst` = and >>>>> `code.rst` using the `autoclass` directive, we=E2=80=99re not sure = where the >>>> user >>>>> will be navigated. >>>>>=20 >>>>>=20 >>>>> There are several solutions:: >>>>>=20 >>>>> 1. >>>>>=20 >>>>> Leave it as is. Then we need to agree on which `autoclass` = directive >>>>> should have the `no-index` flags. >>>>> 2. >>>>>=20 >>>>> Delete duplicates from the `code.rst` file and add a note about = the >>>>> `integration.rst` file in the `code.rst` file. >>>>> 3. >>>>>=20 >>>>> Delete duplicates from the `integration.rst` file and add a note = about >>>>> the `code.rst` file in the `integration.rst` file. >>>>> 4. >>>>>=20 >>>>> Delete information from both files and generate the API = documentation >>>>> always based only on the source code. This solution means that we >> would >>>>> have to write less documentation. >>>>> There are ready tools that we can use: >>>>> 1. >>>>>=20 >>>>> epydoc - http://epydoc.sourceforge.net/ ; >>>>> 2. >>>>>=20 >>>>> autoapi extension for Sphinx - >>>> https://github.com/rtfd/sphinx-autoapi >>>>> ; >>>>> 3. >>>>>=20 >>>>> other - https://wiki.python.org/moin/DocumentationTools >>>>>=20 >>>>>=20 >>>>> The first, second, third solution does not solve all problems. In >>>>> particular, we still need to complete the `code.rst` and >>>> `integration.rst` >>>>> files. The fourth solution solves all problems, but is the most >> complex. >>>> It >>>>> is worth noting that mixing solutions is possible. For example, we = can >>>>> delete information from the file `integration.rst` as short term >> solution >>>>> and start working on creating another form of documentation as a = long >>>> term >>>>> solution. This is the best option in our opinion. >>>>>=20 >>>>> I=E2=80=99ve recently done a few activities related to this topic. >>>>>=20 >>>>> First, I added the noindex flag to autoclass directives for all >> operators >>>>> in `integration.rst` file. In rare cases (If any), this caused = links >> that >>>>> were previously directed to the file `integration.rst` to be = redirected >>>> to >>>>> the `code.rst` file. Elements had to be linked using `:class:` = instead >>>> of a >>>>> section link. Each operator is included in the new section in this >> file. >>>>>=20 >>>>> PR: https://github.com/apache/airflow/pull/4585 >>>>> >>>>>=20 >>>>> Second, I completed the `code.rst` file with the missing classes. >>>>>=20 >>>>> PR: https://github.com/apache/airflow/pull/4644 >>>>>=20 >>>>> I would like to ask which solution is the best in your opinion? = What >>>> steps >>>>> should we take to make the documentation more enjoyable? >>>>>=20 >>>>> Greetings >>>>>=20 >>>>> Kamil Bregu=C5=82a >>>>=20 >>>>=20 >>>=20 >>> -- >>>=20 >>> Kamil Bregu=C5=82a >>> Polidea | Software Engineer >>>=20 >>> M: +48 505 458 451 <+48505458451> >>> E: kamil.bregula@polidea.com >>> [image: Polidea] >>>=20 >>> We create human & business stories through technology. >>> Check out our projects! >>> [image: Github] [image: Facebook] >>> [image: Twitter] >>> [image: Linkedin] >>> [image: Instagram] >>> [image: Behance] >>> >>=20 >>=20 >=20 > --=20 >=20 > Kamil Bregu=C5=82a > Polidea | Software Engineer >=20 > M: +48 505 458 451 <+48505458451> > E: kamil.bregula@polidea.com > [image: Polidea] >=20 > We create human & business stories through technology. > Check out our projects! > [image: Github] [image: Facebook] > [image: Twitter] > [image: Linkedin] > [image: Instagram] > [image: Behance] >