asterixdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ima...@apache.org
Subject [01/36] asterixdb-site git commit: Add 0.9.1 Documentation
Date Tue, 25 Apr 2017 01:55:08 GMT
Repository: asterixdb-site
Updated Branches:
  refs/heads/asf-site a7f7c6d17 -> 100cb803e


http://git-wip-us.apache.org/repos/asf/asterixdb-site/blob/100cb803/docs/0.9.1/udf.html
----------------------------------------------------------------------
diff --git a/docs/0.9.1/udf.html b/docs/0.9.1/udf.html
new file mode 100644
index 0000000..398e8b8
--- /dev/null
+++ b/docs/0.9.1/udf.html
@@ -0,0 +1,429 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-04-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170424" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>AsterixDB &#x2013; Support for User Defined Functions in AsterixDB</title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+        m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+        })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+        ga('create', 'UA-41536543-1', 'uci.edu');
+        ga('send', 'pageview');</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                                  <a href="./" id="bannerLeft">
+                                                                                                <img src="images/asterixlogo.png"  alt="AsterixDB"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                  <li id="publishDate">Last Published: 2017-04-24</li>
+                      
+                
+                    
+                 <li id="projectVersion" class="pull-right">Version: 0.9.1</li>
+      
+                                            <li class="divider pull-right">|</li>
+                        
+    <li class="pull-right">              <a href="index.html" title="Documentation Home">
+        Documentation Home</a>
+  </li>
+
+                        </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">Get Started - Installation</li>
+                                
+      <li>
+    
+                          <a href="ncservice.html" title="Option 1: using NCService">
+          <i class="none"></i>
+        Option 1: using NCService</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="ansible.html" title="Option 2: using Ansible">
+          <i class="none"></i>
+        Option 2: using Ansible</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aws.html" title="Option 3: using Amazon Web Services">
+          <i class="none"></i>
+        Option 3: using Amazon Web Services</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="yarn.html" title="Option 4: using YARN">
+          <i class="none"></i>
+        Option 4: using YARN</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="install.html" title="Option 5: using Managix (deprecated)">
+          <i class="none"></i>
+        Option 5: using Managix (deprecated)</a>
+            </li>
+                              <li class="nav-header">AsterixDB Primer</li>
+                                
+      <li>
+    
+                          <a href="sqlpp/primer-sqlpp.html" title="Option 1: using SQL++">
+          <i class="none"></i>
+        Option 1: using SQL++</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/primer.html" title="Option 2: using AQL">
+          <i class="none"></i>
+        Option 2: using AQL</a>
+            </li>
+                              <li class="nav-header">Data Model</li>
+                                
+      <li>
+    
+                          <a href="datamodel.html" title="The Asterix Data Model">
+          <i class="none"></i>
+        The Asterix Data Model</a>
+            </li>
+                              <li class="nav-header">Queries - SQL++</li>
+                                
+      <li>
+    
+                          <a href="sqlpp/manual.html" title="The SQL++ Query Language">
+          <i class="none"></i>
+        The SQL++ Query Language</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="sqlpp/builtins.html" title="Builtin Functions">
+          <i class="none"></i>
+        Builtin Functions</a>
+            </li>
+                              <li class="nav-header">Queries - AQL</li>
+                                
+      <li>
+    
+                          <a href="aql/manual.html" title="The Asterix Query Language (AQL)">
+          <i class="none"></i>
+        The Asterix Query Language (AQL)</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/builtins.html" title="Builtin Functions">
+          <i class="none"></i>
+        Builtin Functions</a>
+            </li>
+                              <li class="nav-header">API/SDK</li>
+                                
+      <li>
+    
+                          <a href="api.html" title="HTTP API">
+          <i class="none"></i>
+        HTTP API</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="csv.html" title="CSV Output">
+          <i class="none"></i>
+        CSV Output</a>
+            </li>
+                              <li class="nav-header">Advanced Features</li>
+                                
+      <li>
+    
+                          <a href="aql/fulltext.html" title="Support of Full-text Queries">
+          <i class="none"></i>
+        Support of Full-text Queries</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/externaldata.html" title="Accessing External Data">
+          <i class="none"></i>
+        Accessing External Data</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="feeds/tutorial.html" title="Support for Data Ingestion">
+          <i class="none"></i>
+        Support for Data Ingestion</a>
+            </li>
+                  
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>User Defined Functions</a>
+          </li>
+                  
+      <li>
+    
+                          <a href="aql/filters.html" title="Filter-Based LSM Index Acceleration">
+          <i class="none"></i>
+        Filter-Based LSM Index Acceleration</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/similarity.html" title="Support of Similarity Queries">
+          <i class="none"></i>
+        Support of Similarity Queries</a>
+            </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                                                                                                                         <a href="./" title="AsterixDB" class="builtBy">
+        <img class="builtBy"  alt="AsterixDB" src="images/asterixlogo.png"    />
+      </a>
+                      </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements.  See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership.  The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License.  You may obtain a copy of the License at
+ !
+ !   http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied.  See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>Support for User Defined Functions in AsterixDB</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="atoc" id="#toc">Table of Contents</a></h2>
+
+<ul>
+  
+<li><a href="#PreprocessingCollectedData">Using UDF to preprocess feed-collected data</a></li>
+  
+<li><a href="#WritingAnExternalUDF">Writing an External UDF</a></li>
+  
+<li><a href="#CreatingAnAsterixDBLibrary">Creating an AsterixDB Library</a></li>
+  
+<li><a href="#installingUDF">Installing an AsterixDB Library</a></li>
+</ul>
+<p>In this document, we describe the support for implementing, using, and installing user-defined functions (UDF) in AsterixDB. We will explain how we can use UDFs to preprocess, e.g., data collected using feeds (see the <a href="feeds/tutorial.html">feeds tutorial</a>).</p>
+<div class="section">
+<h3><a name="Installing_an_AsterixDB_Library"></a><a name="installingUDF">Installing an AsterixDB Library</a></h3>
+<p>We assume you have followed the <a href="../install.html">installation instructions</a> to set up a running AsterixDB instance. Let us refer your AsterixDB instance by the name &#x201c;my_asterix&#x201d;.</p>
+
+<ul>
+  
+<li>
+<p>Step 1: Stop the AsterixDB instance if it is in the ACTIVE state.</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ managix stop -n my_asterix
+</pre></div></div></li>
+  
+<li>
+<p>Step 2: Install the library using Managix install command. Just to illustrate, we use the help command to look up the syntax</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ managix help  -cmd install
+Installs a library to an asterix instance.
+Options
+n  Name of Asterix Instance
+d  Name of the dataverse under which the library will be installed
+l  Name of the library
+p  Path to library zip bundle
+</pre></div></div></li>
+</ul>
+<p>Above is a sample output and explains the usage and the required parameters. Each library has a name and is installed under a dataverse. Recall that we had created a dataverse by the name - &#x201c;feeds&#x201d; prior to creating our datatypes and dataset. We shall name our library - &#x201c;testlib&#x201d;.</p>
+<p>We assume you have a library zip bundle that needs to be installed. To install the library, use the Managix install command. An example is shown below.</p>
+
+<div class="source">
+<div class="source">
+<pre>    $ managix install -n my_asterix -d feeds -l testlib -p extlibs/asterix-external-data-0.8.7-binary-assembly.zip
+</pre></div></div>
+<p>You should see the following message:</p>
+
+<div class="source">
+<div class="source">
+<pre>    INFO: Installed library testlib
+</pre></div></div>
+<p>We shall next start our AsterixDB instance using the start command as shown below.</p>
+
+<div class="source">
+<div class="source">
+<pre>    $ managix start -n my_asterix
+</pre></div></div>
+<p>You may now use the AsterixDB library in AQL statements and queries. To look at the installed artifacts, you may execute the following query at the AsterixDB web-console.</p>
+
+<div class="source">
+<div class="source">
+<pre>    for $x in dataset Metadata.Function
+    return $x
+
+    for $x in dataset Metadata.Library
+    return $x
+</pre></div></div>
+<p>Our library is now installed and is ready to be used.</p></div></div>
+<div class="section">
+<h2><a name="Preprocessing_Collected_Data"></a><a name="PreprocessingCollectedData" id="PreprocessingCollectedData">Preprocessing Collected Data</a></h2>
+<p>In the following we assume that you already created the <tt>TwitterFeed</tt> and its corresponding data types and dataset following the instruction explained in the <a href="feeds/tutorial.html">feeds tutorial</a>.</p>
+<p>A feed definition may optionally include the specification of a user-defined function that is to be applied to each feed object prior to persistence. Examples of pre-processing might include adding attributes, filtering out objects, sampling, sentiment analysis, feature extraction, etc. We can express a UDF, which can be defined in AQL or in a programming language such as Java, to perform such pre-processing. An AQL UDF is a good fit when pre-processing a object requires the result of a query (join or aggregate) over data contained in AsterixDB datasets. More sophisticated processing such as sentiment analysis of text is better handled by providing a Java UDF. A Java UDF has an initialization phase that allows the UDF to access any resources it may need to initialize itself prior to being used in a data flow. It is assumed by the AsterixDB compiler to be stateless and thus usable as an embarrassingly parallel black box. In contrast, the AsterixDB compiler can reason about an AQL 
 UDF and involve the use of indexes during its invocation.</p>
+<p>We consider an example transformation of a raw tweet into its lightweight version called <tt>ProcessedTweet</tt>, which is defined next.</p>
+
+<div class="source">
+<div class="source">
+<pre>    use dataverse feeds;
+
+    create type ProcessedTweet if not exists as open {
+        id: string,
+        user_name:string,
+        location:point,
+        created_at:string,
+        message_text:string,
+        country: string,
+        topics: {{string}}
+    };
+
+    create dataset ProcessedTweets(ProcessedTweet)
+    primary key id;
+</pre></div></div>
+<p>The processing required in transforming a collected tweet to its lighter version of type <tt>ProcessedTweet</tt> involves extracting the topics or hash-tags (if any) in a tweet and collecting them in the referred &#x201c;topics&#x201d; attribute for the tweet. Additionally, the latitude and longitude values (doubles) are combined into the spatial point type. Note that spatial data types are considered as first-class citizens that come with the support for creating indexes. Next we show a revised version of our example TwitterFeed that involves the use of a UDF. We assume that the UDF that contains the transformation logic into a &#x201c;ProcessedTweet&#x201d; is available as a Java UDF inside an AsterixDB library named &#x2018;testlib&#x2019;. We defer the writing of a Java UDF and its installation as part of an AsterixDB library to a later section of this document.</p>
+
+<div class="source">
+<div class="source">
+<pre>    use dataverse feeds;
+
+    create feed ProcessedTwitterFeed if not exists
+    using &quot;push_twitter&quot;
+    ((&quot;type-name&quot;=&quot;Tweet&quot;),
+    (&quot;consumer.key&quot;=&quot;************&quot;),
+    (&quot;consumer.secret&quot;=&quot;**************&quot;),
+    (&quot;access.token&quot;=&quot;**********&quot;),
+    (&quot;access.token.secret&quot;=&quot;*************&quot;))
+
+    apply function testlib#addHashTagsInPlace;
+</pre></div></div>
+<p>Note that a feed adaptor and a UDF act as pluggable components. These contribute towards providing a generic &#x201c;plug-and-play&#x201d; model where custom implementations can be provided to cater to specific requirements.</p>
+<div class="section">
+<div class="section">
+<h4><a name="Building_a_Cascade_Network_of_Feeds"></a>Building a Cascade Network of Feeds</h4>
+<p>Multiple high-level applications may wish to consume the data ingested from a data feed. Each such application might perceive the feed in a different way and require the arriving data to be processed and/or persisted differently. Building a separate flow of data from the external source for each application is wasteful of resources as the pre-processing or transformations required by each application might overlap and could be done together in an incremental fashion to avoid redundancy. A single flow of data from the external source could provide data for multiple applications. To achieve this, we introduce the notion of primary and secondary feeds in AsterixDB.</p>
+<p>A feed in AsterixDB is considered to be a primary feed if it gets its data from an external data source. The objects contained in a feed (subsequent to any pre-processing) are directed to a designated AsterixDB dataset. Alternatively or additionally, these objects can be used to derive other feeds known as secondary feeds. A secondary feed is similar to its parent feed in every other aspect; it can have an associated UDF to allow for any subsequent processing, can be persisted into a dataset, and/or can be made to derive other secondary feeds to form a cascade network. A primary feed and a dependent secondary feed form a hierarchy. As an example, we next show an example AQL statement that redefines the previous feed &#x201c;ProcessedTwitterFeed&#x201d; in terms of their respective parent feed (TwitterFeed).</p>
+
+<div class="source">
+<div class="source">
+<pre>    use dataverse feeds;
+
+    drop feed ProcessedTwitterFeed if exists;
+
+    create secondary feed ProcessedTwitterFeed from feed TwitterFeed
+    apply function testlib#addHashTags;
+
+    connect feed ProcessedTwitterFeed to dataset ProcessedTweets;
+</pre></div></div>
+<p>The <tt>addHashTags</tt> function is already provided in the example UDF.To see what objects are being inserted into the dataset, we can perform a simple dataset scan after allowing a few moments for the feed to start ingesting data:</p>
+
+<div class="source">
+<div class="source">
+<pre>    use dataverse feeds;
+
+    for $i in dataset ProcessedTweets limit 10 return $i;
+</pre></div></div>
+<p>For an example of how to write a Java UDF from scratch, the source for the example UDF that has been used in this tutorial is available <a class="externalLink" href="https://github.com/apache/asterixdb/tree/master/asterixdb/asterix-external-data/src/test/java/org/apache/asterix/external/library">here</a></p></div></div></div>
+<div class="section">
+<h2><a name="Unstalling_an_AsterixDB_Library"></a><a name="installingUDF">Unstalling an AsterixDB Library</a></h2>
+<p>To uninstall a library, use the Managix uninstall command as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre>    $ managix stop -n my_asterix
+
+    $ managix uninstall -n my_asterix -d feeds -l testlib
+</pre></div></div></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                                                                  <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+        feather logo, and the Apache AsterixDB project logo are either
+        registered trademarks or trademarks of The Apache Software
+        Foundation in the United States and other countries.
+        All other marks mentioned may be trademarks or registered
+        trademarks of their respective owners.</div>
+                  
+        
+                </div>
+    </footer>
+  </body>
+</html>

http://git-wip-us.apache.org/repos/asf/asterixdb-site/blob/100cb803/docs/0.9.1/yarn.html
----------------------------------------------------------------------
diff --git a/docs/0.9.1/yarn.html b/docs/0.9.1/yarn.html
new file mode 100644
index 0000000..7027bf4
--- /dev/null
+++ b/docs/0.9.1/yarn.html
@@ -0,0 +1,734 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-04-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170424" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>AsterixDB &#x2013; Introduction</title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+        m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+        })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+        ga('create', 'UA-41536543-1', 'uci.edu');
+        ga('send', 'pageview');</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                                  <a href="./" id="bannerLeft">
+                                                                                                <img src="images/asterixlogo.png"  alt="AsterixDB"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                  <li id="publishDate">Last Published: 2017-04-24</li>
+                      
+                
+                    
+                 <li id="projectVersion" class="pull-right">Version: 0.9.1</li>
+      
+                                            <li class="divider pull-right">|</li>
+                        
+    <li class="pull-right">              <a href="index.html" title="Documentation Home">
+        Documentation Home</a>
+  </li>
+
+                        </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">Get Started - Installation</li>
+                                
+      <li>
+    
+                          <a href="ncservice.html" title="Option 1: using NCService">
+          <i class="none"></i>
+        Option 1: using NCService</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="ansible.html" title="Option 2: using Ansible">
+          <i class="none"></i>
+        Option 2: using Ansible</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aws.html" title="Option 3: using Amazon Web Services">
+          <i class="none"></i>
+        Option 3: using Amazon Web Services</a>
+            </li>
+                  
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>Option 4: using YARN</a>
+          </li>
+                  
+      <li>
+    
+                          <a href="install.html" title="Option 5: using Managix (deprecated)">
+          <i class="none"></i>
+        Option 5: using Managix (deprecated)</a>
+            </li>
+                              <li class="nav-header">AsterixDB Primer</li>
+                                
+      <li>
+    
+                          <a href="sqlpp/primer-sqlpp.html" title="Option 1: using SQL++">
+          <i class="none"></i>
+        Option 1: using SQL++</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/primer.html" title="Option 2: using AQL">
+          <i class="none"></i>
+        Option 2: using AQL</a>
+            </li>
+                              <li class="nav-header">Data Model</li>
+                                
+      <li>
+    
+                          <a href="datamodel.html" title="The Asterix Data Model">
+          <i class="none"></i>
+        The Asterix Data Model</a>
+            </li>
+                              <li class="nav-header">Queries - SQL++</li>
+                                
+      <li>
+    
+                          <a href="sqlpp/manual.html" title="The SQL++ Query Language">
+          <i class="none"></i>
+        The SQL++ Query Language</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="sqlpp/builtins.html" title="Builtin Functions">
+          <i class="none"></i>
+        Builtin Functions</a>
+            </li>
+                              <li class="nav-header">Queries - AQL</li>
+                                
+      <li>
+    
+                          <a href="aql/manual.html" title="The Asterix Query Language (AQL)">
+          <i class="none"></i>
+        The Asterix Query Language (AQL)</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/builtins.html" title="Builtin Functions">
+          <i class="none"></i>
+        Builtin Functions</a>
+            </li>
+                              <li class="nav-header">API/SDK</li>
+                                
+      <li>
+    
+                          <a href="api.html" title="HTTP API">
+          <i class="none"></i>
+        HTTP API</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="csv.html" title="CSV Output">
+          <i class="none"></i>
+        CSV Output</a>
+            </li>
+                              <li class="nav-header">Advanced Features</li>
+                                
+      <li>
+    
+                          <a href="aql/fulltext.html" title="Support of Full-text Queries">
+          <i class="none"></i>
+        Support of Full-text Queries</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/externaldata.html" title="Accessing External Data">
+          <i class="none"></i>
+        Accessing External Data</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="feeds/tutorial.html" title="Support for Data Ingestion">
+          <i class="none"></i>
+        Support for Data Ingestion</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="udf.html" title="User Defined Functions">
+          <i class="none"></i>
+        User Defined Functions</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/filters.html" title="Filter-Based LSM Index Acceleration">
+          <i class="none"></i>
+        Filter-Based LSM Index Acceleration</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="aql/similarity.html" title="Support of Similarity Queries">
+          <i class="none"></i>
+        Support of Similarity Queries</a>
+            </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                                                                                                                         <a href="./" title="AsterixDB" class="builtBy">
+        <img class="builtBy"  alt="AsterixDB" src="images/asterixlogo.png"    />
+      </a>
+                      </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements.  See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership.  The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License.  You may obtain a copy of the License at
+ !
+ !   http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied.  See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>Introduction</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+  
+<li><a href="#arch">Architecture Overview</a></li>
+  
+<li><a href="#prereq">Prerequisites</a></li>
+  
+<li><a href="#tut">Tutorial Installation</a></li>
+  
+<li><a href="#faq">FAQ and Common Issues</a></li>
+  
+<li><a href="#detail">Reference guide to AsterixDB&#x2019;s YARN Client</a></li>
+</ul>
+<p>This is a guide describing how to deploy AsterixDB onto a YARN-based environment.</p></div>
+<div class="section">
+<h2><a name="AsterixDB_and_the_YARN_environment"></a><a name="arch" id="arch">AsterixDB and the YARN environment</a></h2>
+<p>AsterixDB uses a shared-nothing architecture and local file-based storage- not HDFS. Hence we are reliant on the local storage on each node (&#x2018;iodevices&#x2019; in AsterixDB ). In YARN there are 3 main types of storage available: </p>
+
+<ul>
+  
+<li>HDFS file storage (only suitable for long-lived artifacts, can be slower than local disk)</li>
+  
+<li>Ephemeral container storage that is cleaned by YARN after a container exits (unsuitable except for transient artifacts)</li>
+  
+<li>Node-local destinations not managed by YARN, but which are accesable by the container and live beyond its termination.</li>
+</ul>
+<p>AsterixDB uses only the last type of storage, which is available with both the DefaultContainerExecutor and LinuxContainerExecutor. However keep in mind that with the DefaultContainerExecutor, the directory must be accessable by the same process that the YARN NodeManager is running as, while with the LinuxContainerExecutor it must be accessable by the unix user who is running the job.</p></div>
+<div class="section">
+<h2><a name="Prerequisites"></a><a name="prereq" id="prereq">Prerequisites</a></h2>
+<p>For this tutorial it will be assumed that we have a YARN cluster with the proper environment variables set. To test this, try running the DistributedShell example that is distributed as part of Apache Hadoop. If that sample application can be run successfully then the environment should be acceptable for launching AsterixDB on to your YARN-enabled cluster.</p>
+<div class="section">
+<h3><a name="Vagrant_and_Puppet_Virtualized_cluster_for_Tutorial"></a>Vagrant and Puppet Virtualized cluster for Tutorial</h3>
+<p>For the purposes of this tutorial, a virtualized cluster that matches all of the tutorial configurations can be found at <a class="externalLink" href="https://github.com/parshimers/yarn-sample-cluster">https://github.com/parshimers/yarn-sample-cluster</a>. It requires a machine with about 4-8GB of RAM to run. To start with this cluster, first clone the repository:</p>
+
+<div class="source">
+<div class="source">
+<pre>    &#x21aa; git clone https://github.com/parshimers/yarn-sample-cluster.git
+    Cloning into 'yarn-sample-cluster'...
+    remote: Counting objects: 490, done.
+    remote: Compressing objects: 100% (315/315), done.
+    remote: Total 490 (delta 152), reused 490 (delta 152)
+    Receiving objects: 100% (490/490), 521.34 KiB | 201.00 KiB/s, done.
+    Resolving deltas: 100% (152/152), done.
+    Checking connectivity... done.
+</pre></div></div>
+<p>If the &#x2018;hostmanager&#x2019; plugin for Vagrant isn&#x2019;t already installed, install it like so:</p>
+
+<div class="source">
+<div class="source">
+<pre>    &#x21aa; vagrant plugin install vagrant-hostmanager
+    Installing the 'vagrant-hostmanager' plugin. This can take a few minutes...
+    Installed the plugin 'vagrant-hostmanager (1.5.0)'!
+</pre></div></div>
+<p>Then start the tutorial cluster. The hostmanger plugin may ask for sudo at some point, because it updates your hosts file to include the virtual machines.</p>
+
+<div class="source">
+<div class="source">
+<pre>    &#x21aa; vagrant up
+    Bringing machine 'nc2' up with 'virtualbox' provider...
+    Bringing machine 'nc1' up with 'virtualbox' provider...
+    Bringing machine 'cc' up with 'virtualbox' provider...
+    ...
+</pre></div></div>
+<p>Once vagrant returns, the environment will be ready. The working directory with the Vagrantfile is also visible to each of the virtual machines (in the /vagrant directory), so we will unzip the Asterix binaries here as well for easy access. The YARN binary can be found on the AsterixDB <a class="externalLink" href="https://asterixdb.apache.org/download.html">downloads page</a></p>
+
+<div class="source">
+<div class="source">
+<pre>&#x21aa; unzip -d asterix-yarn/ asterix-yarn-binary-assembly.zip
+...
+</pre></div></div>
+<p>To log into the node from which we will run the rest of the tutorial, use &#x2018;vagrant ssh&#x2019; to get to the CC node and move to the YARN client&#x2019;s location:</p>
+
+<div class="source">
+<div class="source">
+<pre>    &#x21aa; vagrant ssh cc
+    [vagrant@cc ~]$
+    [vagrant@cc ~]$ cd /vagrant/asterix-yarn
+    [vagrant@cc asterix-yarn]$ 
+</pre></div></div>
+<h1><a name="tut" id="tut">Tutorial installation</a></h1></div></div>
+<div class="section">
+<h2><a name="Configuration"></a>Configuration</h2>
+<p>To deploy AsterixDB onto a YARN cluster, we need to construct a configuration file that describes the resources that will be requested from YARN for AsterixDB. </p>
+
+<div class="source">
+
+<div class="source">
+<pre>
+<img src="images/yarn_clust.png" alt="Illustration of a simple YARN cluster with AsterixDB processes." />
+<i>Fig. 1</i>:  Illustration of a simple YARN cluster with AsterixDB processes and their locations
+</pre></div>
+</div>
+<p>This AsterixDB cluster description file corresponds to the above deployed scenario.</p>
+
+<div class="source">
+<div class="source">
+<pre>    &lt;cluster xmlns=&quot;yarn_cluster&quot;&gt;
+        &lt;name&gt;my_awesome_instance&lt;/name&gt;
+        &lt;txn_log_dir&gt;/home/yarn/&lt;/txn_log_dir&gt;
+        &lt;iodevices&gt;/home/yarn/&lt;/iodevices&gt;
+        &lt;store&gt;asterix-data&lt;/store&gt;
+        &lt;master_node&gt;
+            &lt;id&gt;cc&lt;/id&gt;
+            &lt;client_ip&gt;10.10.0.2&lt;/client_ip&gt;
+            &lt;cluster_ip&gt;10.10.0.2&lt;/cluster_ip&gt;
+            &lt;client_port&gt;1098&lt;/client_port&gt;
+            &lt;cluster_port&gt;1099&lt;/cluster_port&gt;
+            &lt;http_port&gt;8888&lt;/http_port&gt;
+        &lt;/master_node&gt;
+        &lt;node&gt;
+            &lt;id&gt;nc1&lt;/id&gt;
+            &lt;cluster_ip&gt;10.10.0.3&lt;/cluster_ip&gt;
+        &lt;/node&gt;
+        &lt;node&gt;
+            &lt;id&gt;nc2&lt;/id&gt;
+            &lt;cluster_ip&gt;10.10.0.4&lt;/cluster_ip&gt;
+        &lt;/node&gt;
+        &lt;metadata_node&gt;nc1&lt;/metadata_node&gt;
+    &lt;/cluster&gt;
+</pre></div></div>
+<p>In this example we have 3 NCs and one CC. Each node is defined by a unique name (not necessarily hostname) and an IP on which AsterixDB nodes will listen and communicate with eachother. This is the &#x2018;cluster_ip&#x2019; parameter. The &#x2018;client_ip&#x2019; parameter is the interface on which client-facing services are presented, for example the web interface. For the next step this file will be saved as &#x2018;my_awesome_cluster_desc.xml&#x2019; in the configs directory.</p></div>
+<div class="section">
+<h2><a name="Installing_and_starting_the_instance"></a>Installing and starting the instance</h2>
+<p>With this configuration in hand, the YARN client can be used to deploy AsterixDB onto the cluster:</p>
+
+<div class="source">
+<div class="source">
+<pre>    [vagrant@cc asterix-yarn]$ bin/asterix -n my_awesome_instance -c configs/my_awesome_cluster_desc.xml install
+    Waiting for new AsterixDB Instance to start  .
+    Asterix successfully deployed and is now running.
+</pre></div></div>
+<p>The instance will be visible in the YARN RM similar to the below image 
+<div class="source"> </p>
+<div class="source">
+<pre> <img src="images/running_inst.png" alt="Illustration of a simple YARN cluster with AsterixDB processes." /> <i>Fig. 2</i>: Hadoop YARN Resource Manager dashboard with running AsterixDB instance </pre></div> </div>
+<p>Once the client returns success, the instance is now ready to be used. We can now use the asterix instance at the CC&#x2019;s IP (10.10.0.2), on the default port (19001).</p>
+
+<div class="source">
+
+<div class="source">
+<pre>
+<img src="images/asterix_webui.png" alt="Illustration of a simple YARN cluster with AsterixDB processes." />
+<i>Fig. 3</i>:  AsterixDB Web User Interface
+</pre></div>
+</div>
+<p>From here, to try things out we could run the ADM &amp; AQL 101 tutorial or any other sample workload.</p></div>
+<div class="section">
+<h2><a name="Stopping_the_instance"></a>Stopping the instance</h2>
+<p>To stop the instance that was just deployed, the <tt>stop</tt> command is used:</p>
+
+<div class="source">
+<div class="source">
+<pre>    [vagrant@cc asterix-yarn]$ bin/asterix -n my_awesome_instance stop
+    Stopping instance my_awesome_instance
+</pre></div></div>
+<p>This attempts a graceful shutdown of the instance. If for some reason this does not succeed, the <tt>kill</tt> action can be used to force shutdown in a similar fashion:</p>
+
+<div class="source">
+<div class="source">
+<pre>    [vagrant@cc asterix-yarn]$ bin/asterix -n my_awesome_instance kill
+    Are you sure you want to kill this instance? In-progress tasks will be aborted
+    Are you sure you want to do this? (yes/no): yes
+</pre></div></div></div>
+<div class="section">
+<h2><a name="Managing_stopped_instances"></a>Managing stopped instances</h2>
+<p>After stopping the instance no containers on any YARN NodeManagers are allocated. However, the state of the instance is still persisted on the local disks (and to a lesser extent, HDFS) of each machine where a Node Controller was deployed, in the iodevices and transaction log folders. Every instance, running or not can be viewed via the <tt>describe</tt> action:</p>
+
+<div class="source">
+<div class="source">
+<pre>    [vagrant@cc asterix-yarn]$ bin/asterix describe
+    Existing AsterixDB instances:
+    Instance my_awesome_instance is stopped
+</pre></div></div></div>
+<div class="section">
+<h2><a name="Starting_inactive_instances"></a>Starting inactive instances</h2>
+<p>To start the instance back up once more, the <tt>start</tt> action is used:</p>
+
+<div class="source">
+<div class="source">
+<pre>    [vagrant@cc asterix-yarn]$ bin/asterix -n my_awesome_instance start
+    Waiting for AsterixDB instance to resume .
+    Asterix successfully deployed and is now running.
+</pre></div></div></div>
+<div class="section">
+<h2><a name="Shutting_down_vagrant"></a>Shutting down vagrant</h2>
+<p>To stop the virtual machines, issue the vagrant halt command from the host machine in the folder containing the Vagrantfile:</p>
+
+<div class="source">
+<div class="source">
+<pre>    &#x21aa; vagrant halt
+</pre></div></div>
+<h1><a name="detail" id="detail">Listing of Commands and Options</a></h1></div>
+<div class="section">
+<h2><a name="Overview"></a>Overview</h2>
+<p>All commands take the format</p>
+
+<div class="source">
+<div class="source">
+<pre>    asterix [action-specific option] [action]
+</pre></div></div>
+<div class="section">
+<h3><a name="Technical_details"></a>Technical details</h3>
+<p>AsterixDB&#x2019;s YARN client is based on static allocation of containers within Node Managers based on IP. The AM and CC processes are currently not integrated in any fashion.</p>
+<p>The <tt>asterix</tt> command itself is simply a wrapper/launcher around the AsterixClient java class, that provides time-saving default parameters. It is possible to run the client directly with <tt>java -jar</tt> given the correct options as well.</p></div></div>
+<div class="section">
+<h2><a name="Actions"></a>Actions</h2>
+<p>Below is a description of the various actions available via the AsterixDB YARN client</p>
+
+<table border="0" class="table table-striped">
+  <thead>
+    
+<tr class="a">
+      
+<th>Action </th>
+      
+<th>Description </th>
+    </tr>
+  </thead>
+  <tbody>
+    
+<tr class="b">
+      
+<td><tt>start</tt> </td>
+      
+<td>Starts an existing instance specified by the -name flag </td>
+    </tr>
+    
+<tr class="a">
+      
+<td><tt>install</tt> </td>
+      
+<td>Deploys and starts an AsterixDB instance described by the config specified in the -c parameter, and named by the -n parameter </td>
+    </tr>
+    
+<tr class="b">
+      
+<td><tt>stop</tt> </td>
+      
+<td>Attempts graceful shutdown of an AsterixDB instance specified in the -name parameter </td>
+    </tr>
+    
+<tr class="a">
+      
+<td><tt>kill</tt> </td>
+      
+<td>Forcefully stops an instance by asking YARN to terminate all of its containers. </td>
+    </tr>
+    
+<tr class="b">
+      
+<td><tt>destroy</tt> </td>
+      
+<td>Remove the instance specified by -name and all of its stored resources from the cluster </td>
+    </tr>
+    
+<tr class="a">
+      
+<td><tt>describe</tt> </td>
+      
+<td>Show all instances, running or not, visible to the AsterixDB YARN client </td>
+    </tr>
+    
+<tr class="b">
+      
+<td><tt>backup</tt> </td>
+      
+<td>Copies the artifacts from a stopped instance to another directory on HDFS so that the instance can be reverted to that state </td>
+    </tr>
+    
+<tr class="a">
+      
+<td><tt>restore</tt> </td>
+      
+<td>Restores an instance to the state saved in a snapshot </td>
+    </tr>
+    
+<tr class="b">
+      
+<td><tt>lsbackup</tt> </td>
+      
+<td>Lists the stored snapshots from an instance </td>
+    </tr>
+    
+<tr class="a">
+      
+<td><tt>rmbackup</tt> </td>
+      
+<td>Removes a snapshot from HDFS </td>
+    </tr>
+    
+<tr class="b">
+      
+<td><tt>libinstall</tt></td>
+      
+<td>Installs an external library or UDF for use in queries </td>
+    </tr>
+  </tbody>
+</table></div>
+<div class="section">
+<h2><a name="Options"></a>Options</h2>
+<p>Below are all availabe options, and which actions they can be applied to</p>
+
+<table border="0" class="table table-striped">
+  <thead>
+    
+<tr class="a">
+      
+<th>Option </th>
+      
+<th>Long Form </th>
+      
+<th>Short Form </th>
+      
+<th>Usage </th>
+      
+<th>Applicability </th>
+    </tr>
+  </thead>
+  <tbody>
+    
+<tr class="b">
+      
+<td>Configuration Path </td>
+      
+<td><tt>-asterixConf</tt> </td>
+      
+<td><tt>-c</tt> </td>
+      
+<td><tt>-c [/path/to/file]</tt>. Path to an AsterixDB Cluster Description File </td>
+      
+<td>Only required with <tt>create</tt> . A configuration in DFS defines the existance of an instance. </td>
+    </tr>
+    
+<tr class="a">
+      
+<td>Instance Name </td>
+      
+<td><tt>-name</tt> </td>
+      
+<td><tt>-n</tt> </td>
+      
+<td><tt>-n [instance name]</tt> Name/Identifier for instance. </td>
+      
+<td>Required for all actions except <tt>describe</tt> and <tt>lsbackup</tt> </td>
+    </tr>
+    
+<tr class="b">
+      
+<td>Asterix Binary Path </td>
+      
+<td><tt>-asterixTar</tt> </td>
+      
+<td><tt>-tar</tt> </td>
+      
+<td><tt>-tar [/path/to/binary]</tt> Path to asterix-server binary. </td>
+      
+<td>This is the AsterixDB server binary that is distributed and run on the DFS. Usually set by default via the launcher script and cached for each instance. Can be manually set, only used in <tt>create</tt> and <tt>install</tt> with <tt>-r</tt> </td>
+    </tr>
+    
+<tr class="a">
+      
+<td>Force </td>
+      
+<td><tt>-force</tt> </td>
+      
+<td><tt>-f</tt> </td>
+      
+<td><tt>-f</tt>. Use at your own risk. Disables any sanity-checking during an action. </td>
+      
+<td>Can be applied to any action, but is mostly useful in cases where an instance cannot be removed properly via <tt>destroy</tt> and cleanup of DFS files is desired. </td>
+    </tr>
+    
+<tr class="b">
+      
+<td>Refresh </td>
+      
+<td><tt>-refresh</tt> </td>
+      
+<td><tt>-r</tt> </td>
+      
+<td><tt>-r</tt>. Replaces cached binary with one mentioned in <tt>-tar</tt>. </td>
+      
+<td>This only has an effect with the <tt>start</tt> action. It can be used to replace/upgrade the binary cached for an instance on the DFS. </td>
+    </tr>
+    
+<tr class="a">
+      
+<td>Base Parameters </td>
+      
+<td><tt>-baseConf</tt> </td>
+      
+<td><tt>-bc</tt> </td>
+      
+<td><tt>-bc [path/to/params]</tt>. Specifies parameter file to use during instance creation/alteration. </td>
+      
+<td>This file specifies various internal properties of the AsterixDB system, such as Buffer Cache size and Page size, among many others. It can be helpful to tweak parameters in this file, however caution should be exercised in keeping them at sane values. Only used during <tt>alter</tt> and <tt>create</tt>. </td>
+    </tr>
+    
+<tr class="b">
+      
+<td>External library path </td>
+      
+<td><tt>-externalLibs</tt> </td>
+      
+<td><tt>-l</tt> </td>
+      
+<td><tt>-l [path/to/library]</tt>. Specifies an external library to upload to an existing instance. </td>
+      
+<td>Only used in <tt>libinstall</tt>. Specifies the file containing the external function to install </td>
+    </tr>
+    
+<tr class="a">
+      
+<td>External library dataverse. </td>
+      
+<td><tt>-libDataverse</tt> </td>
+      
+<td><tt>-ld</tt> </td>
+      
+<td><tt>-ld [existing dataverse name]</tt> </td>
+      
+<td>Only used in <tt>libinstall</tt>. Specifies the dataverse to install the library in an <tt>-l</tt> option to. </td>
+    </tr>
+    
+<tr class="b">
+      
+<td>Snapshot ID </td>
+      
+<td><tt>-snapshot</tt> </td>
+      
+<td>[none] </td>
+      
+<td><tt>-snapshot [backup timestamp/ID]</tt> </td>
+      
+<td>Used with <tt>rmbackup</tt> and <tt>restore</tt> to specify which backup to perform the respective operation on. </td>
+    </tr>
+  </tbody>
+</table>
+<h1><a name="faq" id="faq">Frequently Asked Questions and Common Issues</a></h1>
+<div class="section">
+<h3><a name="Q:_Where_are_the_AsterixDB_logs_located"></a>Q: Where are the AsterixDB logs located?</h3>
+<p>A: YARN manages the logs for each container. They are visible in the YARN Resource Manager&#x2019;s web interface or through the hadoop command line utilities ( see <a class="externalLink" href="http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/">http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/</a> for more details). </p></div>
+<div class="section">
+<h3><a name="Q:_Why_does_AsterixDB_fail_to_start_and_the_logs_contain_errors_like_Container_is_running_beyond_virtual_memory_limits._"></a>Q: Why does AsterixDB fail to start, and the logs contain errors like &#x2018;Container is running beyond virtual memory limits.&#x2019; ?</h3>
+<p>A: This is a quirk of YARN&#x2019;s memory management that can be observed on certain operating systems (mainly CentOS). It is benign unless it causes problems of this type. A work around is to set <tt>yarn.nodemanager.vmem-check-enabled</tt> to <tt>false</tt> in the yarn-site.xml configuration for Hadoop YARN. This makes the NodeManagers avoid checking the virtual memory entirely and instead rely on resident set size to check memory usage among containers.</p></div>
+<div class="section">
+<h3><a name="Q:_How_do_I_upgrade_my_existing_instance"></a>Q: How do I upgrade my existing instance?</h3>
+<p>A: This is a complex question. Generally, one can use the <tt>refresh</tt> option to upgrade the version of an extant AsterixDB instance. However one must be cautious- we do not guarantee ABI compatability between releases. Therefore extreme caution should be exercised when attempting to upgrade this way!</p></div>
+<div class="section">
+<h3><a name="Q:_Does_AsterixDB_work_on_YARN_for_Windows"></a>Q: Does AsterixDB work on YARN for Windows?</h3>
+<p>A: In general, yes! It has been done without much real issue. However it is a infrequent use case, so expect the deployment to have some hiccups. We&#x2019;re always listening on the <a class="externalLink" href="mailto:users@asterixdb.apache.org">users@asterixdb.apache.org</a> mailing list for any issues.</p></div></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                                                                  <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+        feather logo, and the Apache AsterixDB project logo are either
+        registered trademarks or trademarks of The Apache Software
+        Foundation in the United States and other countries.
+        All other marks mentioned may be trademarks or registered
+        trademarks of their respective owners.</div>
+                  
+        
+                </div>
+    </footer>
+  </body>
+</html>


Mime
View raw message