airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
Date Wed, 01 Aug 2018 07:51:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564908#comment-16564908
] 

ASF GitHub Bot commented on AIRFLOW-2832:
-----------------------------------------

Fokko closed pull request #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown
files
URL: https://github.com/apache/incubator-airflow/pull/3670
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index 6000d0e5ff..90452d954b 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,33 +1,34 @@
 Make sure you have checked _all_ steps below.
 
-### JIRA
-- [ ] My PR addresses the following [Airflow JIRA](https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
-    - https://issues.apache.org/jira/browse/AIRFLOW-XXX
-    - In case you are fixing a typo in the documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA issue.
+### Jira
 
+- [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
+  - https://issues.apache.org/jira/browse/AIRFLOW-XXX
+  - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\],
code changes always need a Jira issue.
 
 ### Description
-- [ ] Here are some details about my PR, including screenshots of any UI changes:
 
+- [ ] Here are some details about my PR, including screenshots of any UI changes:
 
 ### Tests
-- [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely
good reason:
 
+- [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely
good reason:
 
 ### Commits
-- [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple
commits if they address the same issue. In addition, my commits follow the guidelines from
"[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
-    1. Subject is separated from body by a blank line
-    2. Subject is limited to 50 characters
-    3. Subject does not end with a period
-    4. Subject uses the imperative mood ("add", not "adding")
-    5. Body wraps at 72 characters
-    6. Body explains "what" and "why", not "how"
 
+- [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple
commits if they address the same issue. In addition, my commits follow the guidelines from
"[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
+  1. Subject is separated from body by a blank line
+  1. Subject is limited to 50 characters (not including Jira issue reference)
+  1. Subject does not end with a period
+  1. Subject uses the imperative mood ("add", not "adding")
+  1. Body wraps at 72 characters
+  1. Body explains "what" and "why", not "how"
 
 ### Documentation
-- [ ] In case of new functionality, my PR adds documentation that describes how to use it.
-    - When adding new operators/hooks/sensors, the autoclass documentation generation needs
to be added.
 
+- [ ] In case of new functionality, my PR adds documentation that describes how to use it.
+  - When adding new operators/hooks/sensors, the autoclass documentation generation needs
to be added.
 
 ### Code Quality
+
 - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 47a1a80549..2cf8e0218e 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -3,22 +3,21 @@
 Contributions are welcome and are greatly appreciated! Every
 little bit helps, and credit will always be given.
 
-
-# Table of Contents
-  * [TOC](#table-of-contents)
-  * [Types of Contributions](#types-of-contributions)
-      - [Report Bugs](#report-bugs)
-      - [Fix Bugs](#fix-bugs)
-      - [Implement Features](#implement-features)
-      - [Improve Documentation](#improve-documentation)
-      - [Submit Feedback](#submit-feedback)
-  * [Documentation](#documentation)
-  * [Development and Testing](#development-and-testing)
-      - [Setting up a development environment](#setting-up-a-development-environment)
-      - [Pull requests guidelines](#pull-request-guidelines)
-      - [Testing Locally](#testing-locally)
-  * [Changing the Metadata Database](#changing-the-metadata-database)
-
+## Table of Contents
+
+- [TOC](#table-of-contents)
+- [Types of Contributions](#types-of-contributions)
+  - [Report Bugs](#report-bugs)
+  - [Fix Bugs](#fix-bugs)
+  - [Implement Features](#implement-features)
+  - [Improve Documentation](#improve-documentation)
+  - [Submit Feedback](#submit-feedback)
+- [Documentation](#documentation)
+- [Development and Testing](#development-and-testing)
+  - [Setting up a development environment](#setting-up-a-development-environment)
+  - [Pull requests guidelines](#pull-request-guidelines)
+  - [Testing Locally](#testing-locally)
+- [Changing the Metadata Database](#changing-the-metadata-database)
 
 ## Types of Contributions
 
@@ -55,11 +54,9 @@ The best way to send feedback is to open an issue on [Apache Jira](https://issue
 
 If you are proposing a feature:
 
--   Explain in detail how it would work.
--   Keep the scope as narrow as possible, to make it easier to
-    implement.
--   Remember that this is a volunteer-driven project, and that
-    contributions are welcome :)
+- Explain in detail how it would work.
+- Keep the scope as narrow as possible, to make it easier to implement.
+- Remember that this is a volunteer-driven project, and that contributions are welcome :)
 
 ## Documentation
 
@@ -68,11 +65,15 @@ The latest API documentation is usually available
 you need to have set up an Airflow development environment (see below). Also
 install the `doc` extra.
 
-    pip install -e .[doc]
+```
+pip install -e .[doc]
+```
 
 Generate the documentation by running:
 
-    cd docs && ./build.sh
+```
+cd docs && ./build.sh
+```
 
 Only a subset of the API reference documentation builds. Install additional
 extras to build the full API reference.
@@ -122,11 +123,13 @@ Please install python(2.7.x or 3.4.x), mysql, and libxml by using system-level
p
 managers like yum, apt-get for Linux, or homebrew for Mac OS at first.
 It is usually best to work in a virtualenv and tox. Install development requirements:
 
-    cd $AIRFLOW_HOME
-    virtualenv env
-    source env/bin/activate
-    pip install -e .[devel]
-    tox
+```
+cd $AIRFLOW_HOME
+virtualenv env
+source env/bin/activate
+pip install -e .[devel]
+tox
+```
 
 Feel free to customize based on the extras available in [setup.py](./setup.py)
 
@@ -135,52 +138,31 @@ Feel free to customize based on the extras available in [setup.py](./setup.py)
 Before you submit a pull request from your forked repo, check that it
 meets these guidelines:
 
-1. The pull request should include tests, either as doctests, unit tests, or
-both. The airflow repo uses [Travis CI](https://travis-ci.org/apache/incubator-airflow)
-to run the tests and [codecov](https://codecov.io/gh/apache/incubator-airflow)
-to track coverage. You can set up both for free on your fork. It will
-help you making sure you do not break the build with your PR and that you help
-increase coverage.
-2. Please [rebase your fork](http://stackoverflow.com/a/7244456/1110993),
-squash commits, and resolve all conflicts.
-3. Every pull request should have an associated
-[JIRA](https://issues.apache.org/jira/browse/AIRFLOW/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
-The JIRA link should also be contained in the PR description.
-4. Preface your commit's subject & PR's title with **[AIRFLOW-XXX]**
-where *XXX* is the JIRA number. We compose release notes (i.e. for Airflow releases) from
all commit titles in a release.
-By placing the JIRA number in the commit title and hence in the release notes,
-Airflow users can look into JIRA and Github PRs for more details about a particular change.
-5. Add an [Apache License](http://www.apache.org/legal/src-headers.html)
- header to all new files
-6. If the pull request adds functionality, the docs should be updated as part
-of the same PR. Doc string are often sufficient.  Make sure to follow the
-Sphinx compatible standards.
-7. The pull request should work for Python 2.7 and 3.4. If you need help
-writing code that works in both Python 2 and 3, see the documentation at the
-[Python-Future project](http://python-future.org) (the future package is an
-Airflow requirement and should be used where possible).
-8. As Airflow grows as a project, we try to enforce a more consistent
-style and try to follow the Python community guidelines. We track this
-using [landscape.io](https://landscape.io/github/apache/incubator-airflow/),
-which you can setup on your fork as well to check before you submit your
-PR. We currently enforce most [PEP8](https://www.python.org/dev/peps/pep-0008/)
-and a few other linting rules. It is usually a good idea to lint locally
-as well using [flake8](https://flake8.readthedocs.org/en/latest/)
-using `flake8 airflow tests`. `git diff upstream/master -u -- "*.py" | flake8 --diff` will
return any changed files in your branch that require linting.
-9. Please read this excellent [article](http://chris.beams.io/posts/git-commit/) on
-commit messages and adhere to them. It makes the lives of those who
-come after you a lot easier.
+1. The pull request should include tests, either as doctests, unit tests, or both. The airflow
repo uses [Travis CI](https://travis-ci.org/apache/incubator-airflow) to run the tests and
[codecov](https://codecov.io/gh/apache/incubator-airflow) to track coverage. You can set up
both for free on your fork. It will help you making sure you do not break the build with your
PR and that you help increase coverage.
+1. Please [rebase your fork](http://stackoverflow.com/a/7244456/1110993), squash commits,
and resolve all conflicts.
+1. Every pull request should have an associated [JIRA](https://issues.apache.org/jira/browse/AIRFLOW/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
The JIRA link should also be contained in the PR description.
+1. Preface your commit's subject & PR's title with **[AIRFLOW-XXX]** where *XXX* is the
JIRA number. We compose release notes (i.e. for Airflow releases) from all commit titles in
a release. By placing the JIRA number in the commit title and hence in the release notes,
Airflow users can look into JIRA and Github PRs for more details about a particular change.
+1. Add an [Apache License](http://www.apache.org/legal/src-headers.html) header to all new
files
+1. If the pull request adds functionality, the docs should be updated as part of the same
PR. Doc string are often sufficient.  Make sure to follow the Sphinx compatible standards.
+1. The pull request should work for Python 2.7 and 3.4. If you need help writing code that
works in both Python 2 and 3, see the documentation at the [Python-Future project](http://python-future.org)
(the future package is an Airflow requirement and should be used where possible).
+1. As Airflow grows as a project, we try to enforce a more consistent style and try to follow
the Python community guidelines. We track this using [landscape.io](https://landscape.io/github/apache/incubator-airflow/),
which you can setup on your fork as well to check before you submit your PR. We currently
enforce most [PEP8](https://www.python.org/dev/peps/pep-0008/) and a few other linting rules.
It is usually a good idea to lint locally as well using [flake8](https://flake8.readthedocs.org/en/latest/)
using `flake8 airflow tests`. `git diff upstream/master -u -- "*.py" | flake8 --diff` will
return any changed files in your branch that require linting.
+1. Please read this excellent [article](http://chris.beams.io/posts/git-commit/) on commit
messages and adhere to them. It makes the lives of those who come after you a lot easier.
 
 ### Testing locally
 
 #### TL;DR
+
 Tests can then be run with (see also the [Running unit tests](#running-unit-tests) section
below):
 
-    ./run_unit_tests.sh
+```
+./run_unit_tests.sh
+```
 
 Individual test files can be run with:
 
-    nosetests [path to file]
+```
+nosetests [path to file]
+```
 
 #### Running unit tests
 
@@ -251,13 +233,16 @@ While these may be phased out over time, these packages are currently
not
 managed with npm.
 
 ### Node/npm versions
+
 Make sure you are using recent versions of node and npm. No problems have been found with
node>=8.11.3 and npm>=6.1.3
 
 ### Using npm to generate bundled files
 
 #### npm
+
 First, npm must be available in your environment. If it is not you can run the following
commands
 (taken from [this source](https://gist.github.com/DanHerbert/9520689))
+
 ```
 brew install node --without-npm
 echo prefix=~/.npm-packages >> ~/.npmrc
@@ -266,11 +251,13 @@ curl -L https://www.npmjs.com/install.sh | sh
 
 The final step is to add `~/.npm-packages/bin` to your `PATH` so commands you install globally
are usable.
 Add something like this to your `.bashrc` file, then `source ~/.bashrc` to reflect the change.
+
 ```
 export PATH="$HOME/.npm-packages/bin:$PATH"
 ```
 
 #### npm packages
+
 To install third party libraries defined in `package.json`, run the
 following within the `airflow/www_rbac/` directory which will install them in a
 new `node_modules/` folder within `www_rbac/`.
@@ -296,13 +283,13 @@ npm run dev
 
 #### Upgrading npm packages
 
-Should you add or upgrade a npm package, which involves changing `package.json`, you'll need
to re-run `npm install` 
+Should you add or upgrade a npm package, which involves changing `package.json`, you'll need
to re-run `npm install`
 and push the newly generated `package-lock.json` file so we get the reproducible build.
 
 #### Javascript Style Guide
 
-We try to enforce a more consistent style and try to follow the JS community guidelines.

-Once you add or modify any javascript code in the project, please make sure it follows the
guidelines 
+We try to enforce a more consistent style and try to follow the JS community guidelines.
+Once you add or modify any javascript code in the project, please make sure it follows the
guidelines
 defined in [Airbnb JavaScript Style Guide](https://github.com/airbnb/javascript).
 Apache Airflow uses [ESLint](https://eslint.org/) as a tool for identifying and reporting
on patterns in JavaScript,
 which can be used by running any of the following commands.
@@ -311,7 +298,6 @@ which can be used by running any of the following commands.
 # Check JS code in .js and .html files, and report any errors/warnings
 npm run lint
 
-# Check JS code in .js and .html files, report any errors/warnings and fix them if possible

+# Check JS code in .js and .html files, report any errors/warnings and fix them if possible
 npm run lint:fix
 ```
- 
diff --git a/README.md b/README.md
index 7256565c1c..11cf37c7b2 100644
--- a/README.md
+++ b/README.md
@@ -23,6 +23,7 @@ makes it easy to visualize pipelines running in production,
 monitor progress, and troubleshoot issues when needed.
 
 ## Getting started
+
 Please visit the Airflow Platform documentation (latest **stable** release) for help with
[installing Airflow](https://airflow.incubator.apache.org/installation.html), getting a [quick
start](https://airflow.incubator.apache.org/start.html), or a more complete [tutorial](https://airflow.incubator.apache.org/tutorial.html).
 
 Documentation of GitHub master (latest development branch): [ReadTheDocs Documentation](https://airflow.readthedocs.io/en/latest/)
@@ -54,22 +55,28 @@ unit of work and continuity.
 ## User Interface
 
 - **DAGs**: Overview of all DAGs in your environment.
-![](/docs/img/dags.png)
+
+  ![](/docs/img/dags.png)
 
 - **Tree View**: Tree representation of a DAG that spans across time.
-![](/docs/img/tree.png)
+
+  ![](/docs/img/tree.png)
 
 - **Graph View**: Visualization of a DAG's dependencies and their current status for a specific
run.
-![](/docs/img/graph.png)
+
+  ![](/docs/img/graph.png)
 
 - **Task Duration**: Total time spent on different tasks over time.
-![](/docs/img/duration.png)
+
+  ![](/docs/img/duration.png)
 
 - **Gantt View**: Duration and overlap of a DAG.
-![](/docs/img/gantt.png)
+
+  ![](/docs/img/gantt.png)
 
 - **Code View**:  Quick way to view source code of a DAG.
-![](/docs/img/code.png)
+
+  ![](/docs/img/code.png)
 
 ## Who uses Airflow?
 
@@ -79,7 +86,7 @@ if you may.
 
 Committers:
 
-* Refer to [Committers](https://cwiki.apache.org/confluence/display/AIRFLOW/Committers)
+- Refer to [Committers](https://cwiki.apache.org/confluence/display/AIRFLOW/Committers)
 
 Currently **officially** using Airflow:
 
@@ -130,7 +137,7 @@ Currently **officially** using Airflow:
 1. [CreditCards.com](https://www.creditcards.com/)[[@vmAggies](https://github.com/vmAggies)
&  [@jay-wallaby](https://github.com/jay-wallaby)]
 1. [Creditas](https://www.creditas.com.br) [[@dcassiano](https://github.com/dcassiano)]
 1. [Custom Ink](https://www.customink.com/) [[@david-dalisay](https://github.com/david-dalisay),
[@dmartin11](https://github.com/dmartin11) & [@mpeteuil](https://github.com/mpeteuil)]
-1. [Dailymotion](http://www.dailymotion.com/fr) [[@germaintanguy](https://github.com/germaintanguy)
& [@hc](https://github.com/hc)] 
+1. [Dailymotion](http://www.dailymotion.com/fr) [[@germaintanguy](https://github.com/germaintanguy)
& [@hc](https://github.com/hc)]
 1. [Data Reply](https://www.datareply.co.uk/) [[@kaxil](https://github.com/kaxil)]
 1. [DataFox](https://www.datafox.com/) [[@sudowork](https://github.com/sudowork)]
 1. [Digital First Media](http://www.digitalfirstmedia.com/) [[@duffn](https://github.com/duffn)
& [@mschmo](https://github.com/mschmo) & [@seanmuth](https://github.com/seanmuth)]
@@ -266,8 +273,7 @@ Currently **officially** using Airflow:
 
 ## Links
 
-
-* [Documentation](https://airflow.incubator.apache.org/)
-* [Chat](https://gitter.im/apache/incubator-airflow)
-* [Apache Airflow Incubation Status](http://incubator.apache.org/projects/airflow.html)
-* [More](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links)
+- [Documentation](https://airflow.incubator.apache.org/)
+- [Chat](https://gitter.im/apache/incubator-airflow)
+- [Apache Airflow Incubation Status](http://incubator.apache.org/projects/airflow.html)
+- [More](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links)
diff --git a/TODO.md b/TODO.md
index 9f1a8a9775..780ca20722 100644
--- a/TODO.md
+++ b/TODO.md
@@ -1,10 +1,11 @@
 #### Roadmap items
+
 * UI page answering "Why isn't this task instance running?"
 * Attempt removing DagBag caching for the web server
 * Distributed scheduler (supervisors)
-    * Get the supervisors to run sensors (as opposed to each sensor taking a slot)
-    * Improve DagBag differential refresh
-    * Pickle all the THINGS! supervisors maintains fresh, versioned pickles in the database
as they monitor for change
+  * Get the supervisors to run sensors (as opposed to each sensor taking a slot)
+  * Improve DagBag differential refresh
+  * Pickle all the THINGS! supervisors maintains fresh, versioned pickles in the database
as they monitor for change
 * Pre-prod running off of master
 * Containment / YarnExecutor / Docker?
 * Get s3 logs
@@ -12,19 +13,23 @@
 * Run Hive / Hadoop / HDFS tests in Travis-CI
 
 #### UI
+
 * Backfill form
 * Better task filtering int duration and landing time charts (operator toggle, task regex,
uncheck all button)
 * Add templating to adhoc queries
 
 #### Backend
+
 * Add a run_only_latest flag to BaseOperator, runs only most recent task instance where deps
are met
 * Raise errors when setting dependencies on task in foreign DAGs
 * Add an is_test flag to the run context
 
 #### Wishlist
+
 * Pause flag at the task level
 * Increase unit test coverage
 * Stats logging interface with support for stats and sqlalchemy to collect detailed information
from the scheduler and dag processing times
 
 #### Other
-* deprecate TimeSensor
+
+* Deprecate TimeSensor
diff --git a/UPDATING.md b/UPDATING.md
index da80f56fcb..e4ca92f56c 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -20,6 +20,7 @@ We also provide a new cli command(``sync_perm``) to allow admin to auto
sync per
 ### Setting UTF-8 as default mime_charset in email utils
 
 ### Add a configuration variable(default_dag_run_display_number) to control numbers of dag
run for display
+
 Add a configuration variable(default_dag_run_display_number) under webserver section to control
num of dag run to show in UI.
 
 ### Default executor for SubDagOperator is changed to SequentialExecutor
@@ -49,6 +50,7 @@ Run `airflow webserver` to start the new UI. This will bring up a log in
page, e
 There are five roles created for Airflow by default: Admin, User, Op, Viewer, and Public.
To configure roles/permissions, go to the `Security` tab and click `List Roles` in the new
UI.
 
 #### Breaking changes
+
 - AWS Batch Operator renamed property queue to job_queue to prevent conflict with the internal
queue from CeleryExecutor - AIRFLOW-2542
 - Users created and stored in the old users table will not be migrated automatically. FAB's
built-in authentication support must be reconfigured.
 - Airflow dag home page is now `/home` (instead of `/admin`).
@@ -68,6 +70,7 @@ to have specified `explicit_defaults_for_timestamp=1` in your my.cnf under
`[mys
 ### Celery config
 
 To make the config of Airflow compatible with Celery, some properties have been renamed:
+
 ```
 celeryd_concurrency -> worker_concurrency
 celery_result_backend -> result_backend
@@ -75,18 +78,22 @@ celery_ssl_active -> ssl_active
 celery_ssl_cert -> ssl_cert
 celery_ssl_key -> ssl_key
 ```
+
 Resulting in the same config parameters as Celery 4, with more transparency.
 
 ### GCP Dataflow Operators
+
 Dataflow job labeling is now supported in Dataflow{Java,Python}Operator with a default
 "airflow-version" label, please upgrade your google-cloud-dataflow or apache-beam version
 to 2.2.0 or greater.
 
 ### BigQuery Hooks and Operator
+
 The `bql` parameter passed to `BigQueryOperator` and `BigQueryBaseCursor.run_query` has been
deprecated and renamed to `sql` for consistency purposes. Using `bql` will still work (and
raise a `DeprecationWarning`), but is no longer
 supported and will be removed entirely in Airflow 2.0
 
 ### Redshift to S3 Operator
+
 With Airflow 1.9 or lower, Unload operation always included header row. In order to include
header row,
 we need to turn off parallel unload. It is preferred to perform unload operation using all
nodes so that it is
 faster for larger tables. So, parameter called `include_header` is added and default is set
to False.
@@ -97,7 +104,9 @@ Header row will be added only if this parameter is set True and also in
that cas
 With Airflow 1.9 or lower, there were two connection strings for the Google Cloud operators,
both `google_cloud_storage_default` and `google_cloud_default`. This can be confusing and
therefore the `google_cloud_storage_default` connection id has been replaced with `google_cloud_default`
to make the connection id consistent across Airflow.
 
 ### Logging Configuration
+
 With Airflow 1.9 or lower, `FILENAME_TEMPLATE`, `PROCESSOR_FILENAME_TEMPLATE`, `LOG_ID_TEMPLATE`,
`END_OF_LOG_MARK` were configured in `airflow_local_settings.py`. These have been moved into
the configuration file, and hence if you were using a custom configuration file the following
defaults need to be added.
+
 ```
 [core]
 fab_logging_level = WARN
@@ -114,18 +123,20 @@ elasticsearch_end_of_log_mark = end_of_log
 ### SSH Hook updates, along with new SSH Operator & SFTP Operator
 
 SSH Hook now uses the Paramiko library to create an ssh client connection, instead of the
sub-process based ssh command execution previously (<1.9.0), so this is backward incompatible.
-  - update SSHHook constructor
-  - use SSHOperator class in place of SSHExecuteOperator which is removed now. Refer to test_ssh_operator.py
for usage info.
-  - SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer
to test_sftp_operator.py.py for usage info.
-  - No updates are required if you are using ftpHook, it will continue to work as is.
+
+- update SSHHook constructor
+- use SSHOperator class in place of SSHExecuteOperator which is removed now. Refer to test_ssh_operator.py
for usage info.
+- SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer to
test_sftp_operator.py.py for usage info.
+- No updates are required if you are using ftpHook, it will continue to work as is.
 
 ### S3Hook switched to use Boto3
 
 The airflow.hooks.S3_hook.S3Hook has been switched to use boto3 instead of the older boto
(a.k.a. boto2). This results in a few backwards incompatible changes to the following classes:
S3Hook:
-  - the constructors no longer accepts `s3_conn_id`. It is now called `aws_conn_id`.
-  - the default connection is now "aws_default" instead of "s3_default"
-  - the return type of objects returned by `get_bucket` is now boto3.s3.Bucket
-  - the return type of `get_key`, and `get_wildcard_key` is now an boto3.S3.Object.
+
+- the constructors no longer accepts `s3_conn_id`. It is now called `aws_conn_id`.
+- the default connection is now "aws_default" instead of "s3_default"
+- the return type of objects returned by `get_bucket` is now boto3.s3.Bucket
+- the return type of `get_key`, and `get_wildcard_key` is now an boto3.S3.Object.
 
 If you are using any of these in your DAGs and specify a connection ID you will need to update
the parameter name for the connection to "aws_conn_id": S3ToHiveTransfer, S3PrefixSensor,
S3KeySensor, RedshiftToS3Transfer.
 
@@ -301,10 +312,11 @@ The `file_task_handler` logger has been made more flexible. The default
format c
 If you are logging to Google cloud storage, please see the [Google cloud platform documentation](https://airflow.incubator.apache.org/integration.html#gcp-google-cloud-platform)
for logging instructions.
 
 If you are using S3, the instructions should be largely the same as the Google cloud platform
instructions above. You will need a custom logging config. The `REMOTE_BASE_LOG_FOLDER` configuration
key in your airflow config has been removed, therefore you will need to take the following
steps:
- - Copy the logging configuration from [`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/airflow_local_settings.py).
- - Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python
2.7, ensuring that any `__init__.py` files exist so that it is importable.
- - Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config.
The `REMOTE_BASE_LOG_FOLDER` key is not used anymore.
- - Set the `logging_config_class` to the filename and dict. For example, if you place `custom_logging_config.py`
on the base of your pythonpath, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG`
in your config as Airflow 1.8.
+
+- Copy the logging configuration from [`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/airflow_local_settings.py).
+- Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python
2.7, ensuring that any `__init__.py` files exist so that it is importable.
+- Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config.
The `REMOTE_BASE_LOG_FOLDER` key is not used anymore.
+- Set the `logging_config_class` to the filename and dict. For example, if you place `custom_logging_config.py`
on the base of your pythonpath, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG`
in your config as Airflow 1.8.
 
 ### New Features
 
@@ -313,8 +325,10 @@ If you are using S3, the instructions should be largely the same as the
Google c
 A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters.
 
 ### Deprecated Features
+
 These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`),
but are no longer
 supported and will be removed entirely in Airflow 2.0
+
 - If you're using the `google_cloud_conn_id` or `dataproc_cluster` argument names explicitly
in `contrib.operators.Dataproc{*}Operator`(s), be sure to rename them to `gcp_conn_id` or
`cluster_name`, respectively. We've renamed these arguments for consistency. (AIRFLOW-1323)
 
 - `post_execute()` hooks now take two arguments, `context` and `result`
@@ -338,30 +352,36 @@ a previously installed version of Airflow before installing 1.8.1.
 ## Airflow 1.8
 
 ### Database
+
 The database schema needs to be upgraded. Make sure to shutdown Airflow and make a backup
of your database. To
 upgrade the schema issue `airflow upgradedb`.
 
 ### Upgrade systemd unit files
+
 Systemd unit files have been updated. If you use systemd please make sure to update these.
 
 > Please note that the webserver does not detach properly, this will be fixed in a future
version.
 
 ### Tasks not starting although dependencies are met due to stricter pool checking
+
 Airflow 1.7.1 has issues with being able to over subscribe to a pool, ie. more slots could
be used than were
 available. This is fixed in Airflow 1.8.0, but due to past issue jobs may fail to start although
their
 dependencies are met after an upgrade. To workaround either temporarily increase the amount
of slots above
 the amount of queued tasks or use a new pool.
 
 ### Less forgiving scheduler on dynamic start_date
+
 Using a dynamic start_date (e.g. `start_date = datetime.now()`) is not considered a best
practice. The 1.8.0 scheduler
 is less forgiving in this area. If you encounter DAGs not being scheduled you can try using
a fixed start_date and
 renaming your DAG. The last step is required to make sure you start with a clean slate, otherwise
the old schedule can
 interfere.
 
 ### New and updated scheduler options
+
 Please read through the new scheduler options, defaults have changed since 1.7.1.
 
 #### child_process_log_directory
+
 In order to increase the robustness of the scheduler, DAGS are now processed in their own
process. Therefore each
 DAG has its own log file for the scheduler. These log files are placed in `child_process_log_directory`
which defaults to
 `<AIRFLOW_HOME>/scheduler/latest`. You will need to make sure these log files are removed.
@@ -369,24 +389,30 @@ DAG has its own log file for the scheduler. These log files are placed
in `child
 > DAG logs or processor logs ignore and command line settings for log file locations.
 
 #### run_duration
+
 Previously the command line option `num_runs` was used to let the scheduler terminate after
a certain amount of
 loops. This is now time bound and defaults to `-1`, which means run continuously. See also
num_runs.
 
 #### num_runs
+
 Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops.
Now num_runs specifies
 the number of times to try to schedule each DAG file within `run_duration` time. Defaults
to `-1`, which means try
 indefinitely. This is only available on the command line.
 
 #### min_file_process_interval
+
 After how much time should an updated DAG be picked up from the filesystem.
 
 #### min_file_parsing_loop_time
+
 How many seconds to wait between file-parsing loops to prevent the logs from being spammed.
 
 #### dag_dir_list_interval
+
 The frequency with which the scheduler should relist the contents of the DAG directory. If
while developing +dags, they are not being picked up, have a look at this number and decrease
it when necessary.
 
 #### catchup_by_default
+
 By default the scheduler will fill any missing interval DAG Runs between the last execution
date and the current date.
 This setting changes that behavior to only execute the latest interval. This can also be
specified per DAG as
 `catchup = False / True`. Command line backfills will still work.
@@ -417,6 +443,7 @@ required to whitelist these variables by adding the following to your
configurat
      <value>airflow\.ctx\..*</value>
 </property>
 ```
+
 ### Google Cloud Operator and Hook alignment
 
 All Google Cloud Operators and Hooks are aligned and use the same client library. Now you
have a single connection
@@ -428,6 +455,7 @@ Also the old P12 key file type is not supported anymore and only the new
JSON ke
 account.
 
 ### Deprecated Features
+
 These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`),
but are no longer
 supported and will be removed entirely in Airflow 2.0
 
@@ -444,6 +472,7 @@ supported and will be removed entirely in Airflow 2.0
 - The config value secure_mode will default to True which will disable some insecure endpoints/features
 
 ### Known Issues
+
 There is a report that the default of "-1" for num_runs creates an issue where errors are
reported while parsing tasks.
 It was not confirmed, but a workaround was found by changing the default back to `None`.
 
@@ -470,7 +499,9 @@ To continue using the default smtp email backend, change the email_backend
line
 [email]
 email_backend = airflow.utils.send_email_smtp
 ```
+
 to:
+
 ```
 [email]
 email_backend = airflow.utils.email.send_email_smtp
@@ -483,7 +514,9 @@ To continue using S3 logging, update your config file so:
 ```
 s3_log_folder = s3://my-airflow-log-bucket/logs
 ```
+
 becomes:
+
 ```
 remote_base_log_folder = s3://my-airflow-log-bucket/logs
 remote_log_conn_id = <your desired s3 connection>
diff --git a/airflow/contrib/example_dags/example_twitter_README.md b/airflow/contrib/example_dags/example_twitter_README.md
index 319eac39f6..d7218bbfc2 100644
--- a/airflow/contrib/example_dags/example_twitter_README.md
+++ b/airflow/contrib/example_dags/example_twitter_README.md
@@ -7,11 +7,11 @@
 ***Overview:*** At first, we need tasks that will get the tweets of our interest and save
them on the hard-disk. Then, we need subsequent tasks that will clean and analyze the tweets.
Then we want to store these files into HDFS, and load them into a Data Warehousing platform
like Hive or HBase. The main reason we have selected Hive here is because it gives us a familiar
SQL like interface, and makes our life of writing different queries a lot easier. Finally,
the DAG needs to store a summarized result to a traditional database, i.e. MySQL or PostgreSQL,
which is used by a reporting or business intelligence application. In other words, we basically
want to achieve the following steps:
 
 1. Fetch Tweets
-2. Clean Tweets
-3. Analyze Tweets
-4. Put Tweets to HDFS
-5. Load data to Hive
-6. Save Summary to MySQL
+1. Clean Tweets
+1. Analyze Tweets
+1. Put Tweets to HDFS
+1. Load data to Hive
+1. Save Summary to MySQL
 
 ***Screenshot:***
 <img src="http://i.imgur.com/rRpSO12.png" width="99%"/>
@@ -21,14 +21,16 @@
 The python functions here are just placeholders. In case you are interested to actually make
this DAG fully functional, first start with filling out the scripts as separate files and
importing them into the DAG with absolute or relative import. My approach was to store the
retrieved data in memory using Pandas dataframe first, and then use the built in method to
save the CSV file on hard-disk.
 The eight different CSV files are then put into eight different folders within HDFS. Each
of the newly inserted files are then loaded into eight different external hive tables. Hive
tables can be external or internal. In this case, we are inserting the data right into the
table, and so we are making our tables internal. Each file is inserted into the respected
Hive table named after the twitter channel, i.e. toTwitter_A or fromTwitter_A. It is also
important to note that when we created the tables, we facilitated for partitioning by date
using the variable dt and declared comma as the row deliminator. The partitioning is very
handy and ensures our query execution time remains constant even with growing volume of data.
 As most probably these folders and hive tables doesn't exist in your system, you will get
an error for these tasks within the DAG. If you rebuild a function DAG from this example,
make sure those folders and hive tables exists. When you create the table, keep the consideration
of table partitioning and declaring comma as the row deliminator in your mind. Furthermore,
you may also need to skip headers on each read and ensure that the user under which you have
Airflow running has the right permission access. Below is a sample HQL snippet on creating
such table:
+
 ```
 CREATE TABLE toTwitter_A(id BIGINT, id_str STRING
-						created_at STRING, text STRING)
-						PARTITIONED BY (dt STRING)
-						ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
-						STORED AS TEXTFILE;
-						alter table toTwitter_A SET serdeproperties ('skip.header.line.count' = '1');
+                         created_at STRING, text STRING)
+                         PARTITIONED BY (dt STRING)
+                         ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+                         STORED AS TEXTFILE;
+                         alter table toTwitter_A SET serdeproperties ('skip.header.line.count'
= '1');
 ```
+
 When you review the code for the DAG, you will notice that these tasks are generated using
for loop. These two for loops could be combined into one loop. However, in most cases, you
will be running different analysis on your incoming incoming and outgoing tweets, and hence
they are kept separated in this example.
 Final step is a running the broker script, brokerapi.py, which will run queries in Hive and
store the summarized data to MySQL in our case. To connect to Hive, pyhs2 library is extremely
useful and easy to use. To insert data into MySQL from Python, sqlalchemy is also a good one
to use.
 I hope you find this tutorial useful. If you have question feel free to ask me on [Twitter](https://twitter.com/EkhtiarSyed)
or via the live Airflow chatroom room in [Gitter](https://gitter.im/airbnb/airflow).<p>
diff --git a/airflow/www/templates/airflow/variables/README.md b/airflow/www/templates/airflow/variables/README.md
index 3fd539f8b5..bf4d80b684 100644
--- a/airflow/www/templates/airflow/variables/README.md
+++ b/airflow/www/templates/airflow/variables/README.md
@@ -1,17 +1,18 @@
-## Variable Editor
-----
+# Variable Editor
+
 This folder contains forms used to edit values in the "Variable" key-value
 store.  This data can be edited under the "Admin" admin tab, but sometimes
 it is preferable to use a form that can perform checking and provide a nicer
 interface.
 
-### Adding a new form
+## Adding a new form
 
 1. Create an html template in `templates/variables` folder
-2. Provide an interface for the user to provide input data
-3. Submit a post request that adds the data as json.
+1. Provide an interface for the user to provide input data
+1. Submit a post request that adds the data as json.
 
 An example ajax POST request is provided below:
+
 ```js
 $("#submit-btn").click(function() {
   form_data = getData()
diff --git a/dev/README.md b/dev/README.md
index f5c93309bd..2f1ecd75ae 100755
--- a/dev/README.md
+++ b/dev/README.md
@@ -8,9 +8,10 @@ It is very important that PRs reference a JIRA issue. The preferred way to
do th
 
 __Please note:__ this tool will restore your current branch when it finishes, but you will
lose any uncommitted changes. Make sure you commit any changes you wish to keep before proceeding.
 
-
 ### Execution
+
 Simply execute the `airflow-pr` tool:
+
 ```
 $ ./airflow-pr
 Usage: airflow-pr [OPTIONS] COMMAND [ARGS]...
@@ -49,12 +50,15 @@ Execute `airflow-pr setup_git_remotes` to configure the default (expected)
git r
 ### Configuration
 
 #### Python Libraries
+
 The merge tool requires the `click` and `jira` libraries to be installed. If the libraries
are not found, the user will be prompted to install them:
+
 ```bash
 pip install click jira
 ```
 
 #### git Remotes
+
 tl;dr run `airflow-pr setup_git_remotes` before using the tool for the first time.
 
 Before using the merge tool, users need to make sure their git remotes are configured. By
default, the tool assumes a setup like the one below, where the github repo remote is named
`github` and the Apache repo remote is named `apache`. If users have other remote names, they
can be supplied by setting environment variables `GITHUB_REMOTE_NAME` and `APACHE_REMOTE_NAME`,
respectively.
@@ -72,25 +76,33 @@ origin	https://github.com/<USER>/airflow (push)
 ```
 
 #### JIRA
+
 Users should set environment variables `JIRA_USERNAME` and `JIRA_PASSWORD` corresponding
to their ASF JIRA login. This will allow the tool to automatically close issues. If they are
not set, the user will be prompted every time.
 
 #### GitHub OAuth Token
+
 Unauthenticated users can only make 60 requests/hour to the Github API. If you get an error
about exceeding the rate, you will need to set a `GITHUB_OAUTH_KEY` environment variable that
contains a token value. Users can generate tokens from their GitHub profile.
 
 ## Airflow release signing tool
+
 The release signing tool can be used to create the SHA512/MD5 and ASC files that required
for Apache releases.
 
 ### Execution
-To create a release tar ball execute following command from Airflow's root. 
 
-`python setup.py compile_assets sdist --formats=gztar`
+To create a release tarball execute following command from Airflow's root.
 
-*Note: `compile_assets` command build the frontend assets (JS and CSS) files for the 
+```bash
+python setup.py compile_assets sdist --formats=gztar
+```
+
+*Note: `compile_assets` command build the frontend assets (JS and CSS) files for the
 Web UI using webpack and npm. Please make sure you have `npm` installed on your local machine
globally.
 Details on how to install `npm` can be found in CONTRIBUTING.md file.*
 
 After that navigate to relative directory i.e., `cd dist` and sign the release files.
 
-`../dev/sign.sh <the_created_tar_ball.tar.gz` 
+```bash
+../dev/sign.sh <the_created_tar_ball.tar.gz
+```
 
 Signing files will be created in the same directory.
diff --git a/scripts/ci/kubernetes/README.md b/scripts/ci/kubernetes/README.md
index 5d1f9c195c..e694bdf8e9 100644
--- a/scripts/ci/kubernetes/README.md
+++ b/scripts/ci/kubernetes/README.md
@@ -5,6 +5,7 @@ If you don't have minikube installed, please run `./minikube/start_minikube.sh`
 First build the docker images by running `./docker/build.sh`. This will build the image and
push it to the local registry. Secondly, deploy Apache Airflow using `./kube/deploy.sh`. Finally,
open the Airflow webserver page by browsing to `http://192.168.99.100:30809/admin/` (on OSX).
 
 When kicking of a new job, you should be able to see new pods being kicked off:
+
 ```
 $ kubectl get pods
 NAME                                                                  READY     STATUS  
           RESTARTS   AGE


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Inconsistencies and linter errors across markdown files
> -------------------------------------------------------
>
>                 Key: AIRFLOW-2832
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: docs, Documentation
>            Reporter: Taylor Edmiston
>            Assignee: Taylor Edmiston
>            Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the Airflow
project.  Most of these are simple formatting issues easily fixed by linting (e.g., with
mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Mime
View raw message