airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maximebeauche...@apache.org
Subject [18/34] incubator-airflow-site git commit: Initial commit
Date Sun, 05 Jun 2016 05:24:08 GMT
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/9e19165c/_modules/vertica_operator.html
----------------------------------------------------------------------
diff --git a/_modules/vertica_operator.html b/_modules/vertica_operator.html
new file mode 100644
index 0000000..54caf66
--- /dev/null
+++ b/_modules/vertica_operator.html
@@ -0,0 +1,233 @@
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>vertica_operator &mdash; Airflow Documentation</title>
+  
+
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  
+
+  
+
+  
+    <link rel="top" title="Airflow Documentation" href="../index.html"/>
+        <link rel="up" title="Module code" href="index.html"/> 
+
+  
+  <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  <div class="wy-grid-for-nav">
+
+    
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search">
+          
+
+          
+            <a href="../index.html" class="icon icon-home"> Airflow
+          
+
+          
+          </a>
+
+          
+            
+            
+          
+
+          
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+
+          
+        </div>
+
+        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+          
+            
+            
+                <ul>
+<li class="toctree-l1"><a class="reference internal" href="../project.html">Project</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../license.html">License</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../start.html">Quick Start</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../installation.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../tutorial.html">Tutorial</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../configuration.html">Configuration</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../ui.html">UI / Screenshots</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../concepts.html">Concepts</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../profiling.html">Data Profiling</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../cli.html">Command Line Interface</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../scheduler.html">Scheduling &amp; Triggers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../plugins.html">Plugins</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../security.html">Security</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faq.html">FAQ</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../code.html">API Reference</a></li>
+</ul>
+
+            
+          
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+      
+      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
+        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+        <a href="../index.html">Airflow</a>
+      </nav>
+
+
+      
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li><a href="../index.html">Docs</a> &raquo;</li>
+      
+          <li><a href="index.html">Module code</a> &raquo;</li>
+      
+    <li>vertica_operator</li>
+      <li class="wy-breadcrumbs-aside">
+        
+          
+        
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <h1>Source code for vertica_operator</h1><div class="highlight"><pre>
+<span></span><span class="kn">import</span> <span class="nn">logging</span>
+
+<span class="kn">from</span> <span class="nn">airflow.contrib.hooks</span> <span class="kn">import</span> <span class="n">VerticaHook</span>
+<span class="kn">from</span> <span class="nn">airflow.models</span> <span class="kn">import</span> <span class="n">BaseOperator</span>
+<span class="kn">from</span> <span class="nn">airflow.utils.decorators</span> <span class="kn">import</span> <span class="n">apply_defaults</span>
+
+
+<div class="viewcode-block" id="VerticaOperator"><a class="viewcode-back" href="../code.html#airflow.contrib.operators.VerticaOperator">[docs]</a><span class="k">class</span> <span class="nc">VerticaOperator</span><span class="p">(</span><span class="n">BaseOperator</span><span class="p">):</span>
+    <span class="sd">&quot;&quot;&quot;</span>
+<span class="sd">    Executes sql code in a specific Vertica database</span>
+
+<span class="sd">    :param vertica_conn_id: reference to a specific Vertica database</span>
+<span class="sd">    :type vertica_conn_id: string</span>
+<span class="sd">    :param sql: the sql code to be executed</span>
+<span class="sd">    :type sql: Can receive a str representing a sql statement,</span>
+<span class="sd">        a list of str (sql statements), or reference to a template file.</span>
+<span class="sd">        Template reference are recognized by str ending in &#39;.sql&#39;</span>
+<span class="sd">    &quot;&quot;&quot;</span>
+
+    <span class="n">template_fields</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;sql&#39;</span><span class="p">,)</span>
+    <span class="n">template_ext</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;.sql&#39;</span><span class="p">,)</span>
+    <span class="n">ui_color</span> <span class="o">=</span> <span class="s1">&#39;#b4e0ff&#39;</span>
+
+    <span class="nd">@apply_defaults</span>
+    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sql</span><span class="p">,</span> <span class="n">vertica_conn_id</span><span class="o">=</span><span class="s1">&#39;vertica_default&#39;</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
+        <span class="nb">super</span><span class="p">(</span><span class="n">VerticaOperator</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">vertica_conn_id</span> <span class="o">=</span> <span class="n">vertica_conn_id</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">sql</span> <span class="o">=</span> <span class="n">sql</span>
+
+    <span class="k">def</span> <span class="nf">execute</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
+        <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;Executing: &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sql</span><span class="p">))</span>
+        <span class="n">hook</span> <span class="o">=</span> <span class="n">VerticaHook</span><span class="p">(</span><span class="n">vertica_conn_id</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">vertica_conn_id</span><span class="p">)</span>
+        <span class="n">hook</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sql</span><span class="p">)</span></div>
+</pre></div>
+
+           </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2014, Maxime Beauchemin, Airbnb.
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+
+  
+
+  
+  
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  
+
+  
+  
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.StickyNav.enable();
+      });
+  </script>
+   
+
+</body>
+</html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/9e19165c/_modules/vertica_to_hive.html
----------------------------------------------------------------------
diff --git a/_modules/vertica_to_hive.html b/_modules/vertica_to_hive.html
new file mode 100644
index 0000000..abe189b
--- /dev/null
+++ b/_modules/vertica_to_hive.html
@@ -0,0 +1,316 @@
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>vertica_to_hive &mdash; Airflow Documentation</title>
+  
+
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  
+
+  
+
+  
+    <link rel="top" title="Airflow Documentation" href="../index.html"/>
+        <link rel="up" title="Module code" href="index.html"/> 
+
+  
+  <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  <div class="wy-grid-for-nav">
+
+    
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search">
+          
+
+          
+            <a href="../index.html" class="icon icon-home"> Airflow
+          
+
+          
+          </a>
+
+          
+            
+            
+          
+
+          
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+
+          
+        </div>
+
+        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+          
+            
+            
+                <ul>
+<li class="toctree-l1"><a class="reference internal" href="../project.html">Project</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../license.html">License</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../start.html">Quick Start</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../installation.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../tutorial.html">Tutorial</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../configuration.html">Configuration</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../ui.html">UI / Screenshots</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../concepts.html">Concepts</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../profiling.html">Data Profiling</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../cli.html">Command Line Interface</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../scheduler.html">Scheduling &amp; Triggers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../plugins.html">Plugins</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../security.html">Security</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faq.html">FAQ</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../code.html">API Reference</a></li>
+</ul>
+
+            
+          
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+      
+      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
+        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+        <a href="../index.html">Airflow</a>
+      </nav>
+
+
+      
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li><a href="../index.html">Docs</a> &raquo;</li>
+      
+          <li><a href="index.html">Module code</a> &raquo;</li>
+      
+    <li>vertica_to_hive</li>
+      <li class="wy-breadcrumbs-aside">
+        
+          
+        
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <h1>Source code for vertica_to_hive</h1><div class="highlight"><pre>
+<span></span><span class="kn">from</span> <span class="nn">builtins</span> <span class="kn">import</span> <span class="nb">chr</span>
+<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">OrderedDict</span>
+<span class="kn">import</span> <span class="nn">unicodecsv</span> <span class="kn">as</span> <span class="nn">csv</span>
+<span class="kn">import</span> <span class="nn">logging</span>
+<span class="kn">from</span> <span class="nn">tempfile</span> <span class="kn">import</span> <span class="n">NamedTemporaryFile</span>
+
+<span class="kn">from</span> <span class="nn">airflow.hooks</span> <span class="kn">import</span> <span class="n">HiveCliHook</span>
+<span class="kn">from</span> <span class="nn">airflow.contrib.hooks</span> <span class="kn">import</span> <span class="n">VerticaHook</span>
+<span class="kn">from</span> <span class="nn">airflow.models</span> <span class="kn">import</span> <span class="n">BaseOperator</span>
+<span class="kn">from</span> <span class="nn">airflow.utils.decorators</span> <span class="kn">import</span> <span class="n">apply_defaults</span>
+
+<div class="viewcode-block" id="VerticaToHiveTransfer"><a class="viewcode-back" href="../code.html#airflow.contrib.operators.VerticaToHiveTransfer">[docs]</a><span class="k">class</span> <span class="nc">VerticaToHiveTransfer</span><span class="p">(</span><span class="n">BaseOperator</span><span class="p">):</span>
+    <span class="sd">&quot;&quot;&quot;</span>
+<span class="sd">    Moves data from Vertia to Hive. The operator runs</span>
+<span class="sd">    your query against Vertia, stores the file locally</span>
+<span class="sd">    before loading it into a Hive table. If the ``create`` or</span>
+<span class="sd">    ``recreate`` arguments are set to ``True``,</span>
+<span class="sd">    a ``CREATE TABLE`` and ``DROP TABLE`` statements are generated.</span>
+<span class="sd">    Hive data types are inferred from the cursor&#39;s metadata.</span>
+<span class="sd">    Note that the table generated in Hive uses ``STORED AS textfile``</span>
+<span class="sd">    which isn&#39;t the most efficient serialization format. If a</span>
+<span class="sd">    large amount of data is loaded and/or if the table gets</span>
+<span class="sd">    queried considerably, you may want to use this operator only to</span>
+<span class="sd">    stage the data into a temporary table before loading it into its</span>
+<span class="sd">    final destination using a ``HiveOperator``.</span>
+
+<span class="sd">    :param sql: SQL query to execute against the Vertia database</span>
+<span class="sd">    :type sql: str</span>
+<span class="sd">    :param hive_table: target Hive table, use dot notation to target a</span>
+<span class="sd">        specific database</span>
+<span class="sd">    :type hive_table: str</span>
+<span class="sd">    :param create: whether to create the table if it doesn&#39;t exist</span>
+<span class="sd">    :type create: bool</span>
+<span class="sd">    :param recreate: whether to drop and recreate the table at every execution</span>
+<span class="sd">    :type recreate: bool</span>
+<span class="sd">    :param partition: target partition as a dict of partition columns and values</span>
+<span class="sd">    :type partition: dict</span>
+<span class="sd">    :param delimiter: field delimiter in the file</span>
+<span class="sd">    :type delimiter: str</span>
+<span class="sd">    :param vertica_conn_id: source Vertica connection</span>
+<span class="sd">    :type vertica_conn_id: str</span>
+<span class="sd">    :param hive_conn_id: destination hive connection</span>
+<span class="sd">    :type hive_conn_id: str</span>
+
+<span class="sd">    &quot;&quot;&quot;</span>
+
+    <span class="n">template_fields</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;sql&#39;</span><span class="p">,</span> <span class="s1">&#39;partition&#39;</span><span class="p">,</span> <span class="s1">&#39;hive_table&#39;</span><span class="p">)</span>
+    <span class="n">template_ext</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;.sql&#39;</span><span class="p">,)</span>
+    <span class="n">ui_color</span> <span class="o">=</span> <span class="s1">&#39;#b4e0ff&#39;</span>
+
+    <span class="nd">@apply_defaults</span>
+    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span>
+            <span class="bp">self</span><span class="p">,</span>
+            <span class="n">sql</span><span class="p">,</span>
+            <span class="n">hive_table</span><span class="p">,</span>
+            <span class="n">create</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
+            <span class="n">recreate</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
+            <span class="n">partition</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
+            <span class="n">delimiter</span><span class="o">=</span><span class="nb">chr</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span>
+            <span class="n">vertica_conn_id</span><span class="o">=</span><span class="s1">&#39;vertica_default&#39;</span><span class="p">,</span>
+            <span class="n">hive_cli_conn_id</span><span class="o">=</span><span class="s1">&#39;hive_cli_default&#39;</span><span class="p">,</span>
+            <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
+        <span class="nb">super</span><span class="p">(</span><span class="n">VerticaToHiveTransfer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">sql</span> <span class="o">=</span> <span class="n">sql</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">hive_table</span> <span class="o">=</span> <span class="n">hive_table</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">partition</span> <span class="o">=</span> <span class="n">partition</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">create</span> <span class="o">=</span> <span class="n">create</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">recreate</span> <span class="o">=</span> <span class="n">recreate</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">delimiter</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">delimiter</span><span class="p">)</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">vertica_conn_id</span> <span class="o">=</span> <span class="n">vertica_conn_id</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">hive_cli_conn_id</span> <span class="o">=</span> <span class="n">hive_cli_conn_id</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">partition</span> <span class="o">=</span> <span class="n">partition</span> <span class="ow">or</span> <span class="p">{}</span>
+
+    <span class="nd">@classmethod</span>
+    <span class="k">def</span> <span class="nf">type_map</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">vertica_type</span><span class="p">):</span>
+        <span class="c1"># vertica-python datatype.py donot provied the full type mapping access.</span>
+        <span class="c1"># Manual hack. Reference: https://github.com/uber/vertica-python/blob/master/vertica_python/vertica/column.py</span>
+        <span class="n">d</span> <span class="o">=</span> <span class="p">{</span>
+            <span class="mi">5</span><span class="p">:</span> <span class="s1">&#39;BOOLEAN&#39;</span><span class="p">,</span>
+            <span class="mi">6</span><span class="p">:</span> <span class="s1">&#39;INT&#39;</span><span class="p">,</span>
+            <span class="mi">7</span><span class="p">:</span> <span class="s1">&#39;FLOAT&#39;</span><span class="p">,</span>
+            <span class="mi">8</span><span class="p">:</span> <span class="s1">&#39;STRING&#39;</span><span class="p">,</span>
+            <span class="mi">9</span><span class="p">:</span> <span class="s1">&#39;STRING&#39;</span><span class="p">,</span>
+            <span class="mi">16</span><span class="p">:</span> <span class="s1">&#39;FLOAT&#39;</span><span class="p">,</span>
+        <span class="p">}</span>
+        <span class="k">return</span> <span class="n">d</span><span class="p">[</span><span class="n">vertica_type</span><span class="p">]</span> <span class="k">if</span> <span class="n">vertica_type</span> <span class="ow">in</span> <span class="n">d</span> <span class="k">else</span> <span class="s1">&#39;STRING&#39;</span>
+
+    <span class="k">def</span> <span class="nf">execute</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
+        <span class="n">hive</span> <span class="o">=</span> <span class="n">HiveCliHook</span><span class="p">(</span><span class="n">hive_cli_conn_id</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">hive_cli_conn_id</span><span class="p">)</span>
+        <span class="n">vertica</span> <span class="o">=</span> <span class="n">VerticaHook</span><span class="p">(</span><span class="n">vertica_conn_id</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">vertica_conn_id</span><span class="p">)</span>
+
+        <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&quot;Dumping Vertica query results to local file&quot;</span><span class="p">)</span>
+        <span class="n">conn</span> <span class="o">=</span> <span class="n">vertica</span><span class="o">.</span><span class="n">get_conn</span><span class="p">()</span>
+        <span class="n">cursor</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span>
+        <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sql</span><span class="p">)</span>
+        <span class="k">with</span> <span class="n">NamedTemporaryFile</span><span class="p">(</span><span class="s2">&quot;w&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
+            <span class="n">csv_writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">delimiter</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">&#39;utf-8&#39;</span><span class="p">)</span>
+            <span class="n">field_dict</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">()</span>
+            <span class="n">col_count</span> <span class="o">=</span> <span class="mi">0</span>
+            <span class="k">for</span> <span class="n">field</span> <span class="ow">in</span> <span class="n">cursor</span><span class="o">.</span><span class="n">description</span><span class="p">:</span>
+                <span class="n">col_count</span> <span class="o">+=</span> <span class="mi">1</span>
+                <span class="n">col_position</span> <span class="o">=</span> <span class="s2">&quot;Column{position}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">position</span><span class="o">=</span><span class="n">col_count</span><span class="p">)</span>
+                <span class="n">field_dict</span><span class="p">[</span><span class="n">col_position</span> <span class="k">if</span> <span class="n">field</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;&#39;</span> <span class="k">else</span> <span class="n">field</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">type_map</span><span class="p">(</span><span class="n">field</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
+            <span class="n">csv_writer</span><span class="o">.</span><span class="n">writerows</span><span class="p">(</span><span class="n">cursor</span><span class="o">.</span><span class="n">iterate</span><span class="p">())</span>
+            <span class="n">f</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span>
+            <span class="n">cursor</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
+            <span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
+            <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&quot;Loading file into Hive&quot;</span><span class="p">)</span>
+            <span class="n">hive</span><span class="o">.</span><span class="n">load_file</span><span class="p">(</span>
+                <span class="n">f</span><span class="o">.</span><span class="n">name</span><span class="p">,</span>
+                <span class="bp">self</span><span class="o">.</span><span class="n">hive_table</span><span class="p">,</span>
+                <span class="n">field_dict</span><span class="o">=</span><span class="n">field_dict</span><span class="p">,</span>
+                <span class="n">create</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">create</span><span class="p">,</span>
+                <span class="n">partition</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">partition</span><span class="p">,</span>
+                <span class="n">delimiter</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">delimiter</span><span class="p">,</span>
+                <span class="n">recreate</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">recreate</span><span class="p">)</span></div>
+</pre></div>
+
+           </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2014, Maxime Beauchemin, Airbnb.
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+
+  
+
+  
+  
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  
+
+  
+  
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.StickyNav.enable();
+      });
+  </script>
+   
+
+</body>
+</html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/9e19165c/_modules/webhdfs_hook.html
----------------------------------------------------------------------
diff --git a/_modules/webhdfs_hook.html b/_modules/webhdfs_hook.html
new file mode 100644
index 0000000..56be098
--- /dev/null
+++ b/_modules/webhdfs_hook.html
@@ -0,0 +1,287 @@
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>webhdfs_hook &mdash; Airflow Documentation</title>
+  
+
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  
+
+  
+
+  
+    <link rel="top" title="Airflow Documentation" href="../index.html"/>
+        <link rel="up" title="Module code" href="index.html"/> 
+
+  
+  <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  <div class="wy-grid-for-nav">
+
+    
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search">
+          
+
+          
+            <a href="../index.html" class="icon icon-home"> Airflow
+          
+
+          
+          </a>
+
+          
+            
+            
+          
+
+          
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+
+          
+        </div>
+
+        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+          
+            
+            
+                <ul>
+<li class="toctree-l1"><a class="reference internal" href="../project.html">Project</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../license.html">License</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../start.html">Quick Start</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../installation.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../tutorial.html">Tutorial</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../configuration.html">Configuration</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../ui.html">UI / Screenshots</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../concepts.html">Concepts</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../profiling.html">Data Profiling</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../cli.html">Command Line Interface</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../scheduler.html">Scheduling &amp; Triggers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../plugins.html">Plugins</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../security.html">Security</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faq.html">FAQ</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../code.html">API Reference</a></li>
+</ul>
+
+            
+          
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+      
+      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
+        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+        <a href="../index.html">Airflow</a>
+      </nav>
+
+
+      
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li><a href="../index.html">Docs</a> &raquo;</li>
+      
+          <li><a href="index.html">Module code</a> &raquo;</li>
+      
+    <li>webhdfs_hook</li>
+      <li class="wy-breadcrumbs-aside">
+        
+          
+        
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <h1>Source code for webhdfs_hook</h1><div class="highlight"><pre>
+<span></span><span class="kn">from</span> <span class="nn">airflow.hooks.base_hook</span> <span class="kn">import</span> <span class="n">BaseHook</span>
+<span class="kn">from</span> <span class="nn">airflow</span> <span class="kn">import</span> <span class="n">configuration</span>
+<span class="kn">import</span> <span class="nn">logging</span>
+
+<span class="kn">from</span> <span class="nn">hdfs</span> <span class="kn">import</span> <span class="n">InsecureClient</span><span class="p">,</span> <span class="n">HdfsError</span>
+
+<span class="n">_kerberos_security_mode</span> <span class="o">=</span> <span class="n">configuration</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;core&quot;</span><span class="p">,</span> <span class="s2">&quot;security&quot;</span><span class="p">)</span> <span class="o">==</span> <span class="s2">&quot;kerberos&quot;</span>
+<span class="k">if</span> <span class="n">_kerberos_security_mode</span><span class="p">:</span>
+  <span class="k">try</span><span class="p">:</span>
+    <span class="kn">from</span> <span class="nn">hdfs.ext.kerberos</span> <span class="kn">import</span> <span class="n">KerberosClient</span>
+  <span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
+    <span class="n">logging</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s2">&quot;Could not load the Kerberos extension for the WebHDFSHook.&quot;</span><span class="p">)</span>
+    <span class="k">raise</span>
+<span class="kn">from</span> <span class="nn">airflow.exceptions</span> <span class="kn">import</span> <span class="n">AirflowException</span>
+
+
+<span class="k">class</span> <span class="nc">AirflowWebHDFSHookException</span><span class="p">(</span><span class="n">AirflowException</span><span class="p">):</span>
+    <span class="k">pass</span>
+
+
+<div class="viewcode-block" id="WebHDFSHook"><a class="viewcode-back" href="../code.html#airflow.hooks.WebHDFSHook">[docs]</a><span class="k">class</span> <span class="nc">WebHDFSHook</span><span class="p">(</span><span class="n">BaseHook</span><span class="p">):</span>
+    <span class="sd">&quot;&quot;&quot;</span>
+<span class="sd">    Interact with HDFS. This class is a wrapper around the hdfscli library.</span>
+<span class="sd">    &quot;&quot;&quot;</span>
+    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">webhdfs_conn_id</span><span class="o">=</span><span class="s1">&#39;webhdfs_default&#39;</span><span class="p">,</span> <span class="n">proxy_user</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">webhdfs_conn_id</span> <span class="o">=</span> <span class="n">webhdfs_conn_id</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">proxy_user</span> <span class="o">=</span> <span class="n">proxy_user</span>
+
+<div class="viewcode-block" id="WebHDFSHook.get_conn"><a class="viewcode-back" href="../code.html#airflow.hooks.WebHDFSHook.get_conn">[docs]</a>    <span class="k">def</span> <span class="nf">get_conn</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
+        <span class="sd">&quot;&quot;&quot;</span>
+<span class="sd">        Returns a hdfscli InsecureClient object.</span>
+<span class="sd">        &quot;&quot;&quot;</span>
+        <span class="n">nn_connections</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_connections</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">webhdfs_conn_id</span><span class="p">)</span>
+        <span class="k">for</span> <span class="n">nn</span> <span class="ow">in</span> <span class="n">nn_connections</span><span class="p">:</span>
+            <span class="k">try</span><span class="p">:</span>
+                <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s1">&#39;Trying namenode {}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">host</span><span class="p">))</span>
+                <span class="n">connection_str</span> <span class="o">=</span> <span class="s1">&#39;http://{nn.host}:{nn.port}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">nn</span><span class="o">=</span><span class="n">nn</span><span class="p">)</span>
+                <span class="k">if</span> <span class="n">_kerberos_security_mode</span><span class="p">:</span>
+                  <span class="n">client</span> <span class="o">=</span> <span class="n">KerberosClient</span><span class="p">(</span><span class="n">connection_str</span><span class="p">)</span>
+                <span class="k">else</span><span class="p">:</span>
+                  <span class="n">proxy_user</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">proxy_user</span> <span class="ow">or</span> <span class="n">nn</span><span class="o">.</span><span class="n">login</span>
+                  <span class="n">client</span> <span class="o">=</span> <span class="n">InsecureClient</span><span class="p">(</span><span class="n">connection_str</span><span class="p">,</span> <span class="n">user</span><span class="o">=</span><span class="n">proxy_user</span><span class="p">)</span>
+                <span class="n">client</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="s1">&#39;/&#39;</span><span class="p">)</span>
+                <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s1">&#39;Using namenode {} for hook&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">host</span><span class="p">))</span>
+                <span class="k">return</span> <span class="n">client</span>
+            <span class="k">except</span> <span class="n">HdfsError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
+                <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;Read operation on namenode {nn.host} failed with&quot;</span>
+                              <span class="s2">&quot; error: {e.message}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="o">**</span><span class="nb">locals</span><span class="p">()))</span>
+        <span class="n">nn_hosts</span> <span class="o">=</span> <span class="p">[</span><span class="n">c</span><span class="o">.</span><span class="n">host</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">nn_connections</span><span class="p">]</span>
+        <span class="n">no_nn_error</span> <span class="o">=</span> <span class="s2">&quot;Read operations failed on the namenodes below:</span><span class="se">\n</span><span class="s2">{}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;</span><span class="se">\n</span><span class="s2">&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">nn_hosts</span><span class="p">))</span>
+        <span class="k">raise</span> <span class="n">AirflowWebHDFSHookException</span><span class="p">(</span><span class="n">no_nn_error</span><span class="p">)</span></div>
+
+<div class="viewcode-block" id="WebHDFSHook.check_for_path"><a class="viewcode-back" href="../code.html#airflow.hooks.WebHDFSHook.check_for_path">[docs]</a>    <span class="k">def</span> <span class="nf">check_for_path</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hdfs_path</span><span class="p">):</span>
+        <span class="sd">&quot;&quot;&quot;</span>
+<span class="sd">        Check for the existence of a path in HDFS by querying FileStatus.</span>
+<span class="sd">        &quot;&quot;&quot;</span>
+        <span class="n">c</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_conn</span><span class="p">()</span>
+        <span class="k">return</span> <span class="nb">bool</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="n">hdfs_path</span><span class="p">,</span> <span class="n">strict</span><span class="o">=</span><span class="bp">False</span><span class="p">))</span></div>
+
+<div class="viewcode-block" id="WebHDFSHook.load_file"><a class="viewcode-back" href="../code.html#airflow.hooks.WebHDFSHook.load_file">[docs]</a>    <span class="k">def</span> <span class="nf">load_file</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">source</span><span class="p">,</span> <span class="n">destination</span><span class="p">,</span> <span class="n">overwrite</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">parallelism</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
+                  <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
+        <span class="sd">&quot;&quot;&quot;</span>
+<span class="sd">        Uploads a file to HDFS</span>
+
+<span class="sd">        :param source: Local path to file or folder. If a folder, all the files</span>
+<span class="sd">          inside of it will be uploaded (note that this implies that folders empty</span>
+<span class="sd">          of files will not be created remotely).</span>
+<span class="sd">        :type source: str</span>
+<span class="sd">        :param destination: PTarget HDFS path. If it already exists and is a</span>
+<span class="sd">          directory, files will be uploaded inside.</span>
+<span class="sd">        :type destination: str</span>
+<span class="sd">        :param overwrite: Overwrite any existing file or directory.</span>
+<span class="sd">        :type overwrite: bool</span>
+<span class="sd">        :param parallelism: Number of threads to use for parallelization. A value of</span>
+<span class="sd">          `0` (or negative) uses as many threads as there are files.</span>
+<span class="sd">        :type parallelism: int</span>
+<span class="sd">        :param \*\*kwargs: Keyword arguments forwarded to :meth:`upload`.</span>
+
+
+<span class="sd">        &quot;&quot;&quot;</span>
+        <span class="n">c</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_conn</span><span class="p">()</span>
+        <span class="n">c</span><span class="o">.</span><span class="n">upload</span><span class="p">(</span><span class="n">hdfs_path</span><span class="o">=</span><span class="n">destination</span><span class="p">,</span>
+                 <span class="n">local_path</span><span class="o">=</span><span class="n">source</span><span class="p">,</span>
+                 <span class="n">overwrite</span><span class="o">=</span><span class="n">overwrite</span><span class="p">,</span>
+                 <span class="n">n_threads</span><span class="o">=</span><span class="n">parallelism</span><span class="p">,</span>
+                 <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
+        <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;Uploaded file {} to {}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">destination</span><span class="p">))</span></div></div>
+</pre></div>
+
+           </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2014, Maxime Beauchemin, Airbnb.
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+
+  
+
+  
+  
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  
+
+  
+  
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.StickyNav.enable();
+      });
+  </script>
+   
+
+</body>
+</html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/9e19165c/_sources/cli.txt
----------------------------------------------------------------------
diff --git a/_sources/cli.txt b/_sources/cli.txt
new file mode 100644
index 0000000..f05cbfb
--- /dev/null
+++ b/_sources/cli.txt
@@ -0,0 +1,11 @@
+Command Line Interface
+======================
+
+Airflow has a very rich command line interface that allows for
+many types of operation on a DAG, starting services, and supporting
+development and testing.
+
+.. argparse::
+   :module: airflow.bin.cli
+   :func: get_parser
+   :prog: airflow

http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/9e19165c/_sources/code.txt
----------------------------------------------------------------------
diff --git a/_sources/code.txt b/_sources/code.txt
new file mode 100644
index 0000000..8a48b81
--- /dev/null
+++ b/_sources/code.txt
@@ -0,0 +1,243 @@
+API Reference
+=============
+
+Operators
+---------
+Operators allow for generation of certain types of tasks that become nodes in
+the DAG when instantiated. All operators derive from BaseOperator and
+inherit many attributes and methods that way. Refer to the BaseOperator
+documentation for more details.
+
+There are 3 main types of operators:
+
+- Operators that performs an **action**, or tell another system to
+  perform an action
+- **Transfer** operators move data from one system to another
+- **Sensors** are a certain type of operator that will keep running until a
+  certain criterion is met. Examples include a specific file landing in HDFS or
+  S3, a partition appearing in Hive, or a specific time of the day. Sensors
+  are derived from ``BaseSensorOperator`` and run a poke
+  method at a specified ``poke_interval`` until it returns ``True``.
+
+BaseOperator
+''''''''''''
+All operators are derived from ``BaseOperator`` and acquire much
+functionality through inheritance. Since this is the core of the engine,
+it's worth taking the time to understand the parameters of ``BaseOperator``
+to understand the primitive features that can be leveraged in your
+DAGs.
+
+
+.. autoclass:: airflow.models.BaseOperator
+
+
+BaseSensorOperator
+'''''''''''''''''''
+All sensors are derived from ``BaseSensorOperator``. All sensors inherit
+the ``timeout`` and ``poke_interval`` on top of the ``BaseOperator``
+attributes.
+
+.. autoclass:: airflow.operators.sensors.BaseSensorOperator
+
+
+Operator API
+''''''''''''
+
+.. automodule:: airflow.operators
+    :show-inheritance:
+    :members:
+        BashOperator,
+        BranchPythonOperator,
+        TriggerDagRunOperator,
+        DummyOperator,
+        EmailOperator,
+        ExternalTaskSensor,
+        GenericTransfer,
+        HdfsSensor,
+        Hive2SambaOperator,
+        HiveOperator,
+        HivePartitionSensor,
+        HiveToDruidTransfer,
+        HiveToMySqlTransfer,
+        SimpleHttpOperator,
+        HttpSensor,
+        MetastorePartitionSensor,
+        MsSqlOperator,
+        MsSqlToHiveTransfer,
+        MySqlOperator,
+        MySqlToHiveTransfer,
+        PostgresOperator,
+        PrestoCheckOperator,
+        PrestoIntervalCheckOperator,
+        PrestoValueCheckOperator,
+        PythonOperator,
+        S3KeySensor,
+        S3ToHiveTransfer,
+        ShortCircuitOperator,
+        SlackAPIOperator,
+        SlackAPIPostOperator,
+        SqlSensor,
+        SubDagOperator,
+        TimeSensor,
+        WebHdfsSensor
+
+.. autoclass:: airflow.operators.docker_operator.DockerOperator
+
+
+Community-contributed Operators
+'''''''''''''''''''''''''''''''
+
+.. automodule:: airflow.contrib.operators
+    :show-inheritance:
+    :members:
+        BigQueryOperator,
+        BigQueryToCloudStorageOperator,
+        GoogleCloudStorageDownloadOperator,
+        SSHExecuteOperator,
+        VerticaOperator,
+        VerticaToHiveTransfer
+
+.. autoclass:: airflow.contrib.operators.QuboleOperator
+.. autoclass:: airflow.contrib.operators.hipchat_operator.HipChatAPIOperator
+.. autoclass:: airflow.contrib.operators.hipchat_operator.HipChatAPISendRoomNotificationOperator
+
+.. _macros:
+
+Macros
+---------
+Here's a list of variables and macros that can be used in templates
+
+
+Default Variables
+'''''''''''''''''
+The Airflow engine passes a few variables by default that are accessible
+in all templates
+
+=================================   ====================================
+Variable                            Description
+=================================   ====================================
+``{{ ds }}``                        the execution date as ``YYYY-MM-DD``
+``{{ ds_nodash }}``                 the execution date as ``YYYYMMDD``
+``{{ yesterday_ds }}``              yesterday's date as ``YYYY-MM-DD``
+``{{ yesterday_ds_nodash }}``       yesterday's date as ``YYYYMMDD``
+``{{ tomorrow_ds }}``               tomorrow's date as ``YYYY-MM-DD``
+``{{ tomorrow_ds_nodash }}``        tomorrow's date as ``YYYYMMDD``
+``{{ ts }}``                        same as ``execution_date.isoformat()``
+``{{ ts_nodash }}``                 same as ``ts`` without ``-`` and ``:``
+``{{ execution_date }}``            the execution_date, (datetime.datetime)
+``{{ dag }}``                       the DAG object
+``{{ task }}``                      the Task object
+``{{ macros }}``                    a reference to the macros package, described below
+``{{ task_instance }}``             the task_instance object
+``{{ end_date }}``                  same as ``{{ ds }}``
+``{{ latest_date }}``               same as ``{{ ds }}``
+``{{ ti }}``                        same as ``{{ task_instance }}``
+``{{ params }}``                    a reference to the user-defined params dictionary
+``{{ task_instance_key_str }}``     a unique, human-readable key to the task instance
+                                    formatted ``{dag_id}_{task_id}_{ds}``
+``conf``                            the full configuration object located at
+                                    ``airflow.configuration.conf`` which
+                                    represents the content of your
+                                    ``airflow.cfg``
+``run_id``                          the ``run_id`` of the current DAG run
+``dag_run``                         a reference to the DagRun object
+``test_mode``                       whether the task instance was called using
+                                    the CLI's test subcommand
+=================================   ====================================
+
+Note that you can access the object's attributes and methods with simple
+dot notation. Here are some examples of what is possible:
+``{{ task.owner }}``, ``{{ task.task_id }}``, ``{{ ti.hostname }}``, ...
+Refer to the models documentation for more information on the objects'
+attributes and methods.
+
+Macros
+''''''
+Macros are a way to expose objects to your templates and live under the
+``macros`` namespace in your templates.
+
+A few commonly used libraries and methods are made available.
+
+
+=================================   ====================================
+Variable                            Description
+=================================   ====================================
+``macros.datetime``                 The standard lib's ``datetime.datetime``
+``macros.timedelta``                 The standard lib's ``datetime.timedelta``
+``macros.dateutil``                 A reference to the ``dateutil`` package
+``macros.time``                     The standard lib's ``time``
+``macros.uuid``                     The standard lib's ``uuid``
+``macros.random``                   The standard lib's ``random``
+=================================   ====================================
+
+
+Some airflow specific macros are also defined:
+
+.. automodule:: airflow.macros
+    :show-inheritance:
+    :members:
+
+.. automodule:: airflow.macros.hive
+    :show-inheritance:
+    :members:
+
+.. _models_ref:
+
+Models
+------
+
+Models are built on top of the SQLAlchemy ORM Base class, and instances are
+persisted in the database.
+
+
+.. automodule:: airflow.models
+    :show-inheritance:
+    :members: DAG, BaseOperator, TaskInstance, DagBag, Connection
+
+Hooks
+-----
+.. automodule:: airflow.hooks
+    :show-inheritance:
+    :members:
+        DbApiHook,
+        HiveCliHook,
+        HiveMetastoreHook,
+        HiveServer2Hook,
+        HttpHook,
+        DruidHook,
+        MsSqlHook,
+        MySqlHook,
+        PostgresHook,
+        PrestoHook,
+        S3Hook,
+        SqliteHook,
+        WebHDFSHook
+
+Community contributed hooks
+'''''''''''''''''''''''''''
+
+.. automodule:: airflow.contrib.hooks
+    :show-inheritance:
+    :members:
+        BigQueryHook,
+        GoogleCloudStorageHook,
+        VerticaHook,
+        FTPHook,
+        SSHHook,
+        CloudantHook
+
+Executors
+---------
+Executors are the mechanism by which task instances get run.
+
+.. automodule:: airflow.executors
+    :show-inheritance:
+    :members: LocalExecutor, CeleryExecutor, SequentialExecutor
+
+Community-contributed executors
+'''''''''''''''''''''''''''''''
+
+.. automodule:: airflow.contrib.executors
+    :show-inheritance:
+    :members:
+        MesosExecutor

http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/9e19165c/_sources/concepts.txt
----------------------------------------------------------------------
diff --git a/_sources/concepts.txt b/_sources/concepts.txt
new file mode 100644
index 0000000..6e15ff8
--- /dev/null
+++ b/_sources/concepts.txt
@@ -0,0 +1,758 @@
+Concepts
+########
+
+The Airflow Platform is a tool for describing, executing, and monitoring
+workflows.
+
+Core Ideas
+''''''''''
+
+DAGs
+====
+
+In Airflow, a ``DAG`` -- or a Directed Acyclic Graph -- is a collection of all
+the tasks you want to run, organized in a way that reflects their relationships
+and dependencies.
+
+For example, a simple DAG could consist of three tasks: A, B, and C. It could
+say that A has to run successfully before B can run, but C can run anytime. It
+could say that task A times out after 5 minutes, and B can be restarted up to 5
+times in case it fails. It might also say that the workflow will run every night
+at 10pm, but shouldn't start until a certain date.
+
+In this way, a DAG describes *how* you want to carry out your workflow; but
+notice that we haven't said anything about *what* we actually want to do! A, B,
+and C could be anything. Maybe A prepares data for B to analyze while C sends an
+email. Or perhaps A monitors your location so B can open your garage door while
+C turns on your house lights. The important thing is that the DAG isn't
+concerned with what its constituent tasks do; its job is to make sure that
+whatever they do happens at the right time, or in the right order, or with the
+right handling of any unexpected issues.
+
+DAGs are defined in standard Python files that are placed in Airflow's
+``DAG_FOLDER``. Airflow will execute the code in each file to dynamically build
+the ``DAG`` objects. You can have as many DAGs as you want, each describing an
+arbitrary number of tasks. In general, each one should correspond to a single
+logical workflow.
+
+Scope
+-----
+
+Airflow will load any ``DAG`` object it can import from a DAGfile. Critically,
+that means the DAG must appear in ``globals()``. Consider the following two
+DAGs. Only ``dag_1`` will be loaded; the other one only appears in a local
+scope.
+
+.. code:: python
+
+    dag_1 = DAG('this_dag_will_be_discovered')
+
+    def my_function()
+        dag_2 = DAG('but_this_dag_will_not')
+
+    my_function()
+
+Sometimes this can be put to good use. For example, a common pattern with
+``SubDagOperator`` is to define the subdag inside a function so that Airflow
+doesn't try to load it as a standalone DAG.
+
+Default Arguments
+-----------------
+
+If a dictionary of ``default_args`` is passed to a DAG, it will apply them to
+any of its operators. This makes it easy to apply a common parameter to many operators without having to type it many times.
+
+.. code:: python
+
+    default_args=dict(
+        start_date=datetime(2016, 1, 1),
+        owner='Airflow')
+
+    dag = DAG('my_dag', default_args=default_args)
+    op = DummyOperator(task_id='dummy', dag=dag)
+    print(op.owner) # Airflow
+
+Context Manager
+---------------
+
+*Added in Airflow 1.8*
+
+DAGs can be used as context managers to automatically assign new operators to that DAG.
+
+.. code:: python
+
+    with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
+        op = DummyOperator('op')
+
+    op.dag is dag # True
+
+Operators
+=========
+
+While DAGs describe *how* to run a workflow, ``Operators`` determine what
+actually gets done.
+
+An operator describes a single task in a workflow. Operators are usually (but
+not always) atomic, meaning they can stand on their own and don't need to share
+resources with any other operators. The DAG will make sure that operators run in
+the correct certain order; other than those dependencies, operators generally
+run independently. In fact, they may run on two completely different machines.
+
+This is a subtle but very important point: in general, if two operators need to
+share information, like a filename or small amount of data, you should consider
+combining them into a single operator. If it absolutely can't be avoided,
+Airflow does have a feature for operator cross-communication called XCom that is
+described elsewhere in this document.
+
+Airflow provides operators for many common tasks, including:
+
+- ``BashOperator`` - executes a bash command
+- ``PythonOperator`` - calls an arbitrary Python function
+- ``EmailOperator`` - sends an email
+- ``HTTPOperator`` - sends an HTTP request
+- ``SqlOperator`` - executes a SQL command
+- ``Sensor`` - waits for a certain time, file, database row, S3 key, etc...
+
+
+In addition to these basic building blocks, there are many more specific
+operators: ``DockerOperator``, ``HiveOperator``, ``S3FileTransferOperator``,
+``PrestoToMysqlOperator``, ``SlackOperator``... you get the idea!
+
+The ``airflow/contrib/`` directory contains yet more operators built by the
+community. These operators aren't always as complete or well-tested as those in
+the main distribution, but allow users to more easily add new functionality to
+the platform.
+
+Operators are only loaded by Airflow if they are assigned to a DAG.
+
+DAG Assignment
+--------------
+
+*Added in Airflow 1.8*
+
+Operators do not have to be assigned to DAGs immediately (previously ``dag`` was
+a required argument). However, once an operator is assigned to a DAG, it can not
+be transferred or unassigned. DAG assignment can be done explicitly when the
+operator is created, through deferred assignment, or even inferred from other
+operators.
+
+.. code:: python
+
+    dag = DAG('my_dag', start_date=datetime(2016, 1, 1))
+
+    # sets the DAG explicitly
+    explicit_op = DummyOperator(task_id='op1', dag=dag)
+
+    # deferred DAG assignment
+    deferred_op = DummyOperator(task_id='op2')
+    deferred_op.dag = dag
+
+    # inferred DAG assignment (linked operators must be in the same DAG)
+    inferred_op = DummyOperator(task_id='op3')
+    inferred_op.set_upstream(deferred_op)
+
+
+Bitshift Composition
+--------------------
+
+*Added in Airflow 1.8*
+
+Traditionally, operator relationships are set with the ``set_upstream()`` and
+``set_downstream()`` methods. In Airflow 1.8, this can be done with the Python
+bitshift operators ``>>`` and ``<<``. The following four statements are all
+functionally equivalent:
+
+.. code:: python
+
+    op1 >> op2
+    op1.set_downstream(op2)
+
+    op2 << op1
+    op2.set_upstream(op1)
+
+When using the bitshift to compose operators, the relationship is set in the
+direction that the bitshift operator points. For example, ``op1 >> op2`` means
+that ``op1`` runs first and ``op2`` runs second. Multiple operators can be
+composed -- keep in mind the chain is executed left-to-right and the rightmost
+object is always returned. For example:
+
+.. code:: python
+
+    op1 >> op2 >> op3 << op4
+
+is equivalent to:
+
+.. code:: python
+
+    op1.set_downstream(op2)
+    op2.set_downstream(op3)
+    op3.set_upstream(op4)
+
+For convenience, the bitshift operators can also be used with DAGs. For example:
+
+.. code:: python
+
+    dag >> op1 >> op2
+
+is equivalent to:
+
+.. code:: python
+
+    op1.dag = dag
+    op1.set_downstream(op2)
+
+We can put this all together to build a simple pipeline:
+
+.. code:: python
+
+    with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
+        (
+            dag
+            >> DummyOperator(task_id='dummy_1')
+            >> BashOperator(
+                task_id='bash_1',
+                bash_command='echo "HELLO!"')
+            >> PythonOperator(
+                task_id='python_1',
+                python_callable=lambda: print("GOODBYE!"))
+        )
+
+Tasks
+=====
+
+Once an operator is instantiated, it is referred to as a "task". The
+instantiation defines specific values when calling the abstract operator, and
+the parameterized task becomes a node in a DAG.
+
+Task Instances
+==============
+
+A task instance represents a specific run of a task and is characterized as the
+combination of a dag, a task, and a point in time. Task instances also have an
+indicative state, which could be "running", "success", "failed", "skipped", "up
+for retry", etc.
+
+Workflows
+=========
+
+You're now familiar with the core building blocks of Airflow.
+Some of the concepts may sound very similar, but the vocabulary can
+be conceptualized like this:
+
+- DAG: a description of the order in which work should take place
+- Operator: a class that acts as a template for carrying out some work
+- Task: a parameterized instance of an operator
+- Task Instance: a task that 1) has been assigned to a DAG and 2) has a
+  state associated with a specific run of the DAG
+
+By combining ``DAGs`` and ``Operators`` to create ``TaskInstances``, you can
+build complex workflows.
+
+Additional Functionality
+''''''''''''''''''''''''
+
+In addition to the core Airflow objects, there are a number of more complex
+features that enable behaviors like limiting simultaneous access to resources,
+cross-communication, conditional execution, and more.
+
+Hooks
+=====
+
+Hooks are interfaces to external platforms and databases like Hive, S3,
+MySQL, Postgres, HDFS, and Pig. Hooks implement a common interface when
+possible, and act as a building block for operators. They also use
+the ``airflow.models.Connection`` model to retrieve hostnames
+and authentication information. Hooks keep authentication code and
+information out of pipelines, centralized in the metadata database.
+
+Hooks are also very useful on their own to use in Python scripts,
+Airflow airflow.operators.PythonOperator, and in interactive environments
+like iPython or Jupyter Notebook.
+
+Pools
+=====
+
+Some systems can get overwhelmed when too many processes hit them at the same
+time. Airflow pools can be used to **limit the execution parallelism** on
+arbitrary sets of tasks. The list of pools is managed in the UI
+(``Menu -> Admin -> Pools``) by giving the pools a name and assigning
+it a number of worker slots. Tasks can then be associated with
+one of the existing pools by using the ``pool`` parameter when
+creating tasks (i.e., instantiating operators).
+
+.. code:: python
+
+    aggregate_db_message_job = BashOperator(
+        task_id='aggregate_db_message_job',
+        execution_timeout=timedelta(hours=3),
+        pool='ep_data_pipeline_db_msg_agg',
+        bash_command=aggregate_db_message_job_cmd,
+        dag=dag)
+    aggregate_db_message_job.set_upstream(wait_for_empty_queue)
+
+The ``pool`` parameter can
+be used in conjunction with ``priority_weight`` to define priorities
+in the queue, and which tasks get executed first as slots open up in the
+pool. The default ``priority_weight`` is ``1``, and can be bumped to any
+number. When sorting the queue to evaluate which task should be executed
+next, we use the ``priority_weight``, summed up with all of the
+``priority_weight`` values from tasks downstream from this task. You can
+use this to bump a specific important task and the whole path to that task
+gets prioritized accordingly.
+
+Tasks will be scheduled as usual while the slots fill up. Once capacity is
+reached, runnable tasks get queued and their state will show as such in the
+UI. As slots free up, queued tasks start running based on the
+``priority_weight`` (of the task and its descendants).
+
+Note that by default tasks aren't assigned to any pool and their
+execution parallelism is only limited to the executor's setting.
+
+Connections
+===========
+
+The connection information to external systems is stored in the Airflow
+metadata database and managed in the UI (``Menu -> Admin -> Connections``)
+A ``conn_id`` is defined there and hostname / login / password / schema
+information attached to it. Airflow pipelines can simply refer to the
+centrally managed ``conn_id`` without having to hard code any of this
+information anywhere.
+
+Many connections with the same ``conn_id`` can be defined and when that
+is the case, and when the **hooks** uses the ``get_connection`` method
+from ``BaseHook``, Airflow will choose one connection randomly, allowing
+for some basic load balancing and fault tolerance when used in conjunction
+with retries.
+
+Airflow also has the ability to reference connections via environment
+variables from the operating system. The environment variable needs to be
+prefixed with ``AIRFLOW_CONN_`` to be considered a connection. When
+referencing the connection in the Airflow pipeline, the ``conn_id`` should
+be the name of the variable without the prefix. For example, if the ``conn_id``
+is named ``POSTGRES_MASTER`` the environment variable should be named
+``AIRFLOW_CONN_POSTGRES_MASTER``. Airflow assumes the value returned
+from the environment variable to be in a URI format
+(e.g. ``postgres://user:password@localhost:5432/master``).
+
+Queues
+======
+
+When using the CeleryExecutor, the celery queues that tasks are sent to
+can be specified. ``queue`` is an attribute of BaseOperator, so any
+task can be assigned to any queue. The default queue for the environment
+is defined in the ``airflow.cfg``'s ``celery -> default_queue``. This defines
+the queue that tasks get assigned to when not specified, as well as which
+queue Airflow workers listen to when started.
+
+Workers can listen to one or multiple queues of tasks. When a worker is
+started (using the command ``airflow worker``), a set of comma delimited
+queue names can be specified (e.g. ``airflow worker -q spark``). This worker
+will then only pick up tasks wired to the specified queue(s).
+
+This can be useful if you need specialized workers, either from a
+resource perspective (for say very lightweight tasks where one worker
+could take thousands of tasks without a problem), or from an environment
+perspective (you want a worker running from within the Spark cluster
+itself because it needs a very specific environment and security rights).
+
+XComs
+=====
+
+XComs let tasks exchange messages, allowing more nuanced forms of control and
+shared state. The name is an abbreviation of "cross-communication". XComs are
+principally defined by a key, value, and timestamp, but also track attributes
+like the task/DAG that created the XCom and when it should become visible. Any
+object that can be pickled can be used as an XCom value, so users should make
+sure to use objects of appropriate size.
+
+XComs can be "pushed" (sent) or "pulled" (received). When a task pushes an
+XCom, it makes it generally available to other tasks. Tasks can push XComs at
+any time by calling the ``xcom_push()`` method. In addition, if a task returns
+a value (either from its Operator's ``execute()`` method, or from a
+PythonOperator's ``python_callable`` function), then an XCom containing that
+value is automatically pushed.
+
+Tasks call ``xcom_pull()`` to retrieve XComs, optionally applying filters
+based on criteria like ``key``, source ``task_ids``, and source ``dag_id``. By
+default, ``xcom_pull()`` filters for the keys that are automatically given to
+XComs when they are pushed by being returned from execute functions (as
+opposed to XComs that are pushed manually).
+
+If ``xcom_pull`` is passed a single string for ``task_ids``, then the most
+recent XCom value from that task is returned; if a list of ``task_ids`` is
+passed, then a correpsonding list of XCom values is returned.
+
+.. code:: python
+
+    # inside a PythonOperator called 'pushing_task'
+    def push_function():
+        return value
+
+    # inside another PythonOperator where provide_context=True
+    def pull_function(**context):
+        value = context['task_instance'].xcom_pull(task_ids='pushing_task')
+
+It is also possible to pull XCom directly in a template, here's an example
+of what this may look like:
+
+.. code:: sql
+
+    SELECT * FROM {{ task_instance.xcom_pull(task_ids='foo', key='table_name') }}
+
+Note that XComs are similar to `Variables`_, but are specifically designed
+for inter-task communication rather than global settings.
+
+
+Variables
+=========
+
+Variables are a generic way to store and retrieve arbitrary content or
+settings as a simple key value store within Airflow. Variables can be
+listed, created, updated and deleted from the UI (``Admin -> Variables``),
+code or CLI. While your pipeline code definition and most of your constants
+and variables should be defined in code and stored in source control,
+it can be useful to have some variables or configuration items
+accessible and modifiable through the UI.
+
+
+.. code:: python
+
+    from airflow.models import Variable
+    foo = Variable.get("foo")
+    bar = Variable.get("bar", deserialize_json=True)
+
+The second call assumes ``json`` content and will be deserialized into
+``bar``. Note that ``Variable`` is a sqlalchemy model and can be used
+as such.
+
+
+Branching
+=========
+
+Sometimes you need a workflow to branch, or only go down a certain path
+based on an arbitrary condition which is typically related to something
+that happened in an upstream task. One way to do this is by using the
+``BranchPythonOperator``.
+
+The ``BranchPythonOperator`` is much like the PythonOperator except that it
+expects a python_callable that returns a task_id. The task_id returned
+is followed, and all of the other paths are skipped.
+The task_id returned by the Python function has to be referencing a task
+directly downstream from the BranchPythonOperator task.
+
+Note that using tasks with ``depends_on_past=True`` downstream from
+``BranchPythonOperator`` is logically unsound as ``skipped`` status
+will invariably lead to block tasks that depend on their past successes.
+``skipped`` states propagates where all directly upstream tasks are
+``skipped``.
+
+If you want to skip some tasks, keep in mind that you can't have an empty
+path, if so make a dummy task.
+
+like this, the dummy task "branch_false" is skipped
+
+.. image:: img/branch_good.png
+
+Not like this, where the join task is skipped
+
+.. image:: img/branch_bad.png
+
+SubDAGs
+=======
+
+SubDAGs are perfect for repeating patterns. Defining a function that returns a
+DAG object is a nice design pattern when using Airflow.
+
+Airbnb uses the *stage-check-exchange* pattern when loading data. Data is staged
+in a temporary table, after which data quality checks are performed against
+that table. Once the checks all pass the partition is moved into the production
+table.
+
+As another example, consider the following DAG:
+
+.. image:: img/subdag_before.png
+
+We can combine all of the parallel ``task-*`` operators into a single SubDAG,
+so that the resulting DAG resembles the following:
+
+.. image:: img/subdag_after.png
+
+Note that SubDAG operators should contain a factory method that returns a DAG
+object. This will prevent the SubDAG from being treated like a separate DAG in
+the main UI. For example:
+
+.. code:: python
+
+  #dags/subdag.py
+  from airflow.models import DAG
+  from airflow.operators import DummyOperator
+
+
+  # Dag is returned by a factory method
+  def sub_dag(parent_dag_name, child_dag_name, start_date, schedule_interval):
+    dag = DAG(
+      '%s.%s' % (parent_dag_name, child_dag_name),
+      schedule_interval=schedule_interval,
+      start_date=start_date,
+    )
+
+    dummy_operator = DummyOperator(
+      task_id='dummy_task',
+      dag=dag,
+    )
+
+    return dag
+
+This SubDAG can then be referenced in your main DAG file:
+
+.. code:: python
+
+  # main_dag.py
+  from datetime import datetime, timedelta
+  from airflow.models import DAG
+  from airflow.operators import SubDagOperator
+  from dags.subdag import sub_dag
+
+
+  PARENT_DAG_NAME = 'parent_dag'
+  CHILD_DAG_NAME = 'child_dag'
+
+  main_dag = DAG(
+    dag_id=PARENT_DAG_NAME,
+    schedule_interval=timedelta(hours=1),
+    start_date=datetime(2016, 1, 1)
+  )
+
+  sub_dag = SubDagOperator(
+    subdag=sub_dag(PARENT_DAG_NAME, CHILD_DAG_NAME, main_dag.start_date,
+                   main_dag.schedule_interval),
+    task_id=CHILD_DAG_NAME,
+    dag=main_dag,
+  )
+
+You can zoom into a SubDagOperator from the graph view of the main DAG to show
+the tasks contained within the SubDAG:
+
+.. image:: img/subdag_zoom.png
+
+Some other tips when using SubDAGs:
+
+-  by convention, a SubDAG's ``dag_id`` should be prefixed by its parent and
+   a dot. As in ``parent.child``
+-  share arguments between the main DAG and the SubDAG by passing arguments to
+   the SubDAG operator (as demonstrated above)
+-  SubDAGs must have a schedule and be enabled. If the SubDAG's schedule is
+   set to ``None`` or ``@once``, the SubDAG will succeed without having done
+   anything
+-  clearing a SubDagOperator also clears the state of the tasks within
+-  marking success on a SubDagOperator does not affect the state of the tasks
+   within
+-  refrain from using ``depends_on_past=True`` in tasks within the SubDAG as
+   this can be confusing
+-  it is possible to specify an executor for the SubDAG. It is common to use
+   the SequentialExecutor if you want to run the SubDAG in-process and
+   effectively limit its parallelism to one. Using LocalExecutor can be
+   problematic as it may over-subscribe your worker, running multiple tasks in
+   a single slot
+
+See ``airflow/example_dags`` for a demonstration.
+
+SLAs
+====
+
+Service Level Agreements, or time by which a task or DAG should have
+succeeded, can be set at a task level as a ``timedelta``. If
+one or many instances have not succeeded by that time, an alert email is sent
+detailing the list of tasks that missed their SLA. The event is also recorded
+in the database and made available in the web UI under ``Browse->Missed SLAs``
+where events can be analyzed and documented.
+
+
+Trigger Rules
+=============
+
+Though the normal workflow behavior is to trigger tasks when all their
+directly upstream tasks have succeeded, Airflow allows for more complex
+dependency settings.
+
+All operators have a ``trigger_rule`` argument which defines the rule by which
+the generated task get triggered. The default value for ``trigger_rule`` is
+``all_success`` and can be defined as "trigger this task when all directly
+upstream tasks have succeeded". All other rules described here are based
+on direct parent tasks and are values that can be passed to any operator
+while creating tasks:
+
+* ``all_success``: (default) all parents have succeeded
+* ``all_failed``: all parents are in a ``failed`` or ``upstream_failed`` state
+* ``all_done``: all parents are done with their execution
+* ``one_failed``: fires as soon as at least one parent has failed, it does not wait for all parents to be done
+* ``one_success``: fires as soon as at least one parent succeeds, it does not wait for all parents to be done
+* ``dummy``: dependencies are just for show, trigger at will
+
+Note that these can be used in conjunction with ``depends_on_past`` (boolean)
+that, when set to ``True``, keeps a task from getting triggered if the
+previous schedule for the task hasn't succeeded.
+
+
+Zombies & Undeads
+=================
+
+Task instances die all the time, usually as part of their normal life cycle,
+but sometimes unexpectedly.
+
+Zombie tasks are characterized by the absence
+of an heartbeat (emitted by the job periodically) and a ``running`` status
+in the database. They can occur when a worker node can't reach the database,
+when Airflow processes are killed externally, or when a node gets rebooted
+for instance. Zombie killing is performed periodically by the scheduler's
+process.
+
+Undead processes are characterized by the existence of a process and a matching
+heartbeat, but Airflow isn't aware of this task as ``running`` in the database.
+This mismatch typically occurs as the state of the database is altered,
+most likely by deleting rows in the "Task Instances" view in the UI.
+Tasks are instructed to verify their state as part of the heartbeat routine,
+and terminate themselves upon figuring out that they are in this "undead"
+state.
+
+
+Cluster Policy
+==============
+
+Your local airflow settings file can define a ``policy`` function that
+has the ability to mutate task attributes based on other task or DAG
+attributes. It receives a single argument as a reference to task objects,
+and is expected to alter its attributes.
+
+For example, this function could apply a specific queue property when
+using a specific operator, or enforce a task timeout policy, making sure
+that no tasks run for more than 48 hours. Here's an example of what this
+may look like inside your ``airflow_settings.py``:
+
+
+.. code:: python
+
+    def policy(task):
+        if task.__class__.__name__ == 'HivePartitionSensor':
+            task.queue = "sensor_queue"
+        if task.timeout > timedelta(hours=48):
+            task.timeout = timedelta(hours=48)
+
+
+Documentation & Notes
+=====================
+
+It's possible to add documentation or notes to your dags & task objects that
+become visible in the web interface ("Graph View" for dags, "Task Details" for
+tasks). There are a set of special task attributes that get rendered as rich
+content if defined:
+
+==========  ================
+attribute   rendered to
+==========  ================
+doc         monospace
+doc_json    json
+doc_yaml    yaml
+doc_md      markdown
+doc_rst     reStructuredText
+==========  ================
+
+Please note that for dags, dag_md is the only attribute interpreted.
+
+This is especially useful if your tasks are built dynamically from
+configuration files, it allows you to expose the configuration that led
+to the related tasks in Airflow.
+
+.. code:: python
+    """
+    ### My great DAG
+    """
+
+    dag = DAG('my_dag', default_args=default_args)
+    dag.doc_md = __doc__
+
+    t = BashOperator("foo", dag=dag)
+    t.doc_md = """\
+    #Title"
+    Here's a [url](www.airbnb.com)
+    """
+
+This content will get rendered as markdown respectively in the "Graph View" and
+"Task Details" pages.
+
+Jinja Templating
+================
+
+Airflow leverages the power of
+`Jinja Templating <http://jinja.pocoo.org/docs/dev/>`_ and this can be a
+powerful tool to use in combination with macros (see the :ref:`macros` section).
+
+For example, say you want to pass the execution date as an environment variable
+to a Bash script using the ``BashOperator``.
+
+.. code:: python
+
+  # The execution date as YYYY-MM-DD
+  date = "{{ ds }}"
+  t = BashOperator(
+      task_id='test_env',
+      bash_command='/tmp/test.sh ',
+      dag=dag,
+      env={'EXECUTION_DATE': date})
+
+Here, ``{{ ds }}`` is a macro, and because the ``env`` parameter of the
+``BashOperator`` is templated with Jinja, the execution date will be available
+as an environment variable named ``EXECUTION_DATE`` in your Bash script.
+
+You can use Jinja templating with every parameter that is marked as "templated"
+in the documentation.
+
+Packaged dags
+'''''''''''''
+While often you will specify dags in a single ``.py`` file it might sometimes
+be required to combine dag and its dependencies. For example, you might want
+to combine several dags together to version them together or you might want
+to manage them together or you might need an extra module that is not available
+by default on the system you are running airflow on. To allow this you can create
+a zip file that contains the dag(s) in the root of the zip file and have the extra
+modules unpacked in directories.
+
+For instance you can create a zip file that looks like this:
+
+.. code-block:: bash
+
+    my_dag1.py
+    my_dag2.py
+    package1/__init__.py
+    package1/functions.py
+
+Airflow will scan the zip file and try to load ``my_dag1.py`` and ``my_dag2.py``.
+It will not go into subdirectories as these are considered to be potential
+packages.
+
+In case you would like to add module dependencies to your DAG you basically would
+do the same, but then it is more to use a virtualenv and pip.
+
+.. code-block:: bash
+
+    virtualenv zip_dag
+    source zip_dag/bin/activate
+
+    mkdir zip_dag_contents
+    cd zip_dag_contents
+
+    pip install --install-option="--install-lib=$PWD" my_useful_package
+    cp ~/my_dag.py .
+
+    zip -r zip_dag.zip *
+
+.. note:: the zip file will be inserted at the beginning of module search list
+   (sys.path) and as such it will be available to any other code that resides
+   within the same interpreter.
+
+.. note:: packaged dags cannot be used with pickling turned on.
+
+.. note:: packaged dags cannot contain dynamic libraries (eg. libz.so) these need
+   to be available on the system if a module needs those. In other words only
+   pure python modules can be packaged.
+


Mime
View raw message