zeppelin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zjf...@apache.org
Subject svn commit: r1867691 [24/41] - in /zeppelin/site: docs/0.8.2/ docs/0.8.2/assets/ docs/0.8.2/assets/themes/ docs/0.8.2/assets/themes/zeppelin/ docs/0.8.2/assets/themes/zeppelin/bootstrap/ docs/0.8.2/assets/themes/zeppelin/bootstrap/css/ docs/0.8.2/asset...
Date Sun, 29 Sep 2019 07:08:15 GMT
Added: zeppelin/site/docs/0.8.2/search_data.json
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.2/search_data.json?rev=1867691&view=auto
==============================================================================
--- zeppelin/site/docs/0.8.2/search_data.json (added)
+++ zeppelin/site/docs/0.8.2/search_data.json Sun Sep 29 07:08:10 2019
@@ -0,0 +1,925 @@
+{
+  
+
+    "interpreter-alluxio": {
+      "title": "Alluxio Interpreter for Apache Zeppelin",
+      "content"  : "Alluxio Interpreter for Apache ZeppelinOverviewAlluxio is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks.Configuration      Name    Class    Description        alluxio.master.hostname    localhost    Alluxio master hostname        alluxio.master.port    19998    Alluxio master port  Enabling Alluxio InterpreterIn a notebook, to enable the Alluxio interpreter, click on the Gear icon and select Alluxio.Using the Alluxio InterpreterIn a paragraph, use %alluxio to select the Alluxio interpreter and then input all commands.%alluxiohelpTip : Use ( Ctrl + . ) for autocompletion.Interpreter CommandsThe Alluxio interpreter accepts the following commands.            Operation      Syntax      Description              cat      cat "path"      Print the content of the file to the console.              chgrp      chgrp "group" "path"      Change the grou
 p of the directory or file.              chmod      chmod "permission" "path"      Change the permission of the directory or file.              chown      chown "owner" "path"      Change the owner of the directory or file.              copyFromLocal      copyFromLocal "source path" "remote path"      Copy the specified file specified by "source path" to the path specified by "remote path".      This command will fail if "remote path" already exists.              copyToLocal      copyToLocal "remote path" "local path"      Copy the specified file from the path specified by "remote path" to a local destination.              count      count "path"      Display the number of folders and files matching the specified prefix in "path".     
          du      du "path"      Display the size of a file or a directory specified by the input path.              fileInfo      fileInfo "path"      Print the information of the blocks of a specified file.              free      free "path"      Free a file or all files under a directory from Alluxio. If the file/directory is also      in under storage, it will still be available there.              getCapacityBytes      getCapacityBytes      Get the capacity of the AlluxioFS.              getUsedBytes      getUsedBytes      Get number of bytes used in the AlluxioFS.              load      load "path"      Load the data of a file or a directory from under storage into Alluxio.              loadMetadata      loadMetadata "path"      Load the metadata of a file or a directory from under storage into Alluxio.              location      location "path"      Display a list of hos
 ts that have the file data.              ls      ls "path"      List all the files and directories directly under the given path with information such as      size.              mkdir      mkdir "path1" ... "pathn"      Create directory(ies) under the given paths, along with any necessary parent directories.      Multiple paths separated by spaces or tabs. This command will fail if any of the given paths      already exist.              mount      mount "path" "uri"      Mount the underlying file system path "uri" into the Alluxio namespace as "path". The "path"      is assumed not to exist and is created by the operation. No data or metadata is loaded from under      storage into Alluxio. After a path is mounted, operations on objects under the mounted path are      mirror to the mounted under storage.              mv      mv "sour
 ce" "destination"      Move a file or directory specified by "source" to a new location "destination". This command      will fail if "destination" already exists.              persist      persist "path"      Persist a file or directory currently stored only in Alluxio to the underlying file system.              pin      pin "path"      Pin the given file to avoid evicting it from memory. If the given path is a directory, it      recursively pins all the files contained and any new files created within this directory.              report      report "path"      Report to the master that a file is lost.              rm      rm "path"      Remove a file. This command will fail if the given path is a directory rather than a file.              setTtl      setTtl "time"      Set the TTL (time to live) in milliseconds t
 o a file.              tail      tail "path"      Print the last 1KB of the specified file to the console.              touch      touch "path"      Create a 0-byte file at the specified location.              unmount      unmount "path"      Unmount the underlying file system path mounted in the Alluxio namespace as "path". Alluxio      objects under "path" are removed from Alluxio, but they still exist in the previously mounted      under storage.              unpin      unpin "path"      Unpin the given file to allow Alluxio to evict this file again. If the given path is a      directory, it recursively unpins all files contained and any new files created within this      directory.              unsetTtl      unsetTtl      Remove the TTL (time to live) setting from a file.      How to test it's workingBe sure to have configured correctly the Alluxio interpreter, the
 n open a new paragraph and type one of the above commands.Below a simple example to show how to interact with Alluxio interpreter.Following steps are performed:using sh interpreter a new text file is created on local machineusing Alluxio interpreter:is listed the content of the afs (Alluxio File System) rootthe file previously created is copied to afsis listed again the content of the afs root to check the existence of the new copied fileis showed the content of the copied file (using the tail command)the file previously copied to afs is copied to local machine using sh interpreter it's checked the existence of the new file copied from Alluxio and its content is showed  ",
+      "url": " /interpreter/alluxio",
+      "group": "interpreter",
+      "excerpt": "Alluxio is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks."
+    }
+    ,
+    
+  
+
+    "usage-display-system-angular-backend": {
+      "title": "Backend Angular API in Apache Zeppelin",
+      "content"  : "Backend Angular API in Apache ZeppelinOverviewAngular display system treats output as a view template for AngularJS.It compiles templates and displays them inside of Apache Zeppelin. Zeppelin provides a gateway between your interpreter and your compiled AngularJS view templates.Therefore, you can not only update scope variables from your interpreter but also watch them in the interpreter, which is JVM process.Basic UsagePrint AngularJS viewTo use angular display system, you should start with %angular.Since name is not defined, Hello will display Hello.Please Note: Display system is backend independent.Bind / Unbind VariablesThrough ZeppelinContext, you can bind / unbind variables to AngularJS view. Currently, it only works in Spark Interpreter ( scala ).// bind my 'object' as angular scope variable 'name' in current notebook.z.angularBind(String name, Object object)// bind my 'object' as angular scope variable &
 #39;name' in all notebooks related to current interpreter.z.angularBindGlobal(String name, Object object)// unbind angular scope variable 'name' in current notebook.z.angularUnbind(String name)// unbind angular scope variable 'name' in all notebooks related to current interpreter.z.angularUnbindGlobal(String name)Using the above example, let's bind world variable to name. Then you can see AngularJs view is immediately updated.Watch / Unwatch VariablesThrough ZeppelinContext, you can watch / unwatch variables in AngularJs view. Currently, it only works in Spark Interpreter ( scala ).// register for angular scope variable 'name' (notebook)z.angularWatch(String name, (before, after) => { ... })// unregister watcher for angular variable 'name' (notebook)z.angularUnwatch(String name)// register for angular scope variable 'name' (global)z.angularWatchGlobal(String name, (before, after) =
 > { ... })// unregister watcher for angular variable 'name' (global)z.angularUnwatchGlobal(String name)Let's make a button. When it is clicked, the value of run will be increased 1 by 1.z.angularBind("run", 0) will initialize run to zero. And then, it will be also applied to run in z.angularWatch().When the button is clicked, you'll see both run and numWatched are incremented by 1.Let's make it Simpler and more IntuitiveIn this section, we will introduce a simpler and more intuitive way of using Angular Display System in Zeppelin.Here are some usages.Import// In notebook scopeimport org.apache.zeppelin.display.angular.notebookscope._import AngularElem._// In paragraph scopeimport org.apache.zeppelin.display.angular.paragraphscope._import AngularElem._Display Element// automatically convert to string and print with %angular display system directive in front.<div></div>.displayEvent Handler// 
 on click<div></div>.onClick(() => {   my callback routine}).display// on change<div></div>.onChange(() => {  my callback routine}).display// arbitrary event<div></div>.onEvent("ng-click", () => {  my callback routine}).displayBind Model// bind model<div></div>.model("myModel").display// bind model with initial value<div></div>.model("myModel", initialValue).displayInteract with Model// read modelAngularModel("myModel")()// update modelAngularModel("myModel", "newValue")Example: Basic UsageUsing the above basic usages, you can apply them like below examples.Display Elements<div style="color:blue">  <h4>Hello Angular Display System</h4></div>.displ
 ayOnClick Event<div class="btn btn-success">  Click me</div>.onClick{() =>  // callback for button click}.displayBind Model  <div>{{{{myModel}}}}</div>.model("myModel", "Initial Value").displayInteract With Model// read the valueAngularModel("myModel")()// update the valueAngularModel("myModel", "New value")Example: String ConverterUsing below example, you can convert the lowercase string to uppercase.// clear previously created angular object.AngularElem.disassociateval button = <div class="btn btn-success btn-sm">Convert</div>.onClick{() =>  val inputString = AngularModel("input")().toString  AngularModel("title", inputString.toUpperCase)}<div>  { <h4> {{{{title}}}}</h4>.model(&qu
 ot;title", "Please type text to convert uppercase") }   Your text { <input type="text"></input>.model("input", "") }  {button}</div>.display",
+      "url": " /usage/display_system/angular_backend",
+      "group": "usage/display_system",
+      "excerpt": "Apache Zeppelin provides a gateway between your interpreter and your compiled AngularJS view templates. You can not only update scope variables from your interpreter but also watch them in the interpreter, which is JVM process."
+    }
+    ,
+    
+  
+
+    "usage-display-system-angular-frontend": {
+      "title": "Frontend Angular API in Apache Zeppelin",
+      "content"  : "Frontend Angular API in Apache ZeppelinBasic UsageIn addition to the backend Angular API to handle Angular objects binding, Apache Zeppelin also exposes a simple AngularJS z object on the front-end side to expose the same capabilities.This z object is accessible in the Angular isolated scope for each paragraph.Bind / Unbind VariablesThrough the z, you can bind / unbind variables to AngularJS view.Bind a value to an angular object and a mandatory target paragraph:%angular<form class="form-inline">  <div class="form-group">    <label for="superheroId">Super Hero: </label>    <input type="text" class="form-control" id="superheroId" placeholder="Superhero name ..." ng-model="superhero"></input>  </div>  <button type=&q
 uot;submit" class="btn btn-primary" ng-click="z.angularBind('superhero',superhero,'20160222-232336_1472609686')"> Bind</button></form>Unbind/remove a value from angular object and a mandatory target paragraph:%angular<form class="form-inline">  <button type="submit" class="btn btn-primary" ng-click="z.angularUnbind('superhero','20160222-232336_1472609686')"> UnBind</button></form>The signature for the z.angularBind() / z.angularUnbind() functions are:// Bindz.angularBind(angularObjectName, angularObjectValue, paragraphId);// Unbindz.angularUnbind(angularObjectName, angularObjectValue, paragraphId);All the parameters are mandatory.Run ParagraphYou can also trigger paragraph execution by calling z.runParagraph() funct
 ion passing the appropriate paragraphId: %angular<form class="form-inline">  <div class="form-group">    <label for="paragraphId">Paragraph Id: </label>    <input type="text" class="form-control" id="paragraphId" placeholder="Paragraph Id ..." ng-model="paragraph"></input>  </div>  <button type="submit" class="btn btn-primary" ng-click="z.runParagraph(paragraph)"> Run Paragraph</button></form>Overriding dynamic form with Angular ObjectThe front-end Angular Interaction API has been designed to offer richer form capabilities and variable binding. With the existing Dynamic Form system you can already create input text, select and checkbox forms but the c
 hoice is rather limited and the look & feel cannot be changed.The idea is to create a custom form using plain HTML/AngularJS code and bind actions on this form to push/remove Angular variables to targeted paragraphs using this new API. Consequently if you use the Dynamic Form syntax in a paragraph and there is a bound Angular object having the same name as the ${formName}, the Angular object will have higher priority and the Dynamic Form will not be displayed. Example: Feature matrix comparisonHow does the front-end AngularJS API compares to the backend Angular API? Below is a comparison matrix for both APIs:                        Actions            Front-end API            Back-end API                                Initiate binding            z.angularbind(var, initialValue, paragraphId)            z.angularBind(var, initialValue)                            Update value            same to ordinary angularjs scope variable, or z.angularbind(var, newValue, paragraphId)     
        z.angularBind(var, newValue)                            Watching value            same to ordinary angularjs scope variable            z.angularWatch(var, (oldVal, newVal) => ...)                            Destroy binding            z.angularUnbind(var, paragraphId)            z.angularUnbind(var)                            Executing Paragraph            z.runParagraph(paragraphId)            z.run(paragraphId)                            Executing Paragraph (Specific paragraphs in other notes) (                        z.run(noteid, paragraphId)                            Executing note                        z.runNote(noteId)                     Both APIs are pretty similar, except for value watching where it is done naturally by AngularJS internals on the front-end and by user custom watcher functions in the back-end.There is also a slight difference in term of scope. Front-end API limits the Angular object binding to a paragraph scope whereas back-end API allows you to 
 bind an Angular object at the global or note scope. This restriction has been designed purposely to avoid Angular object leaks and scope pollution.",
+      "url": " /usage/display_system/angular_frontend",
+      "group": "usage/display_system",
+      "excerpt": "In addition to the back-end API to handle Angular objects binding, Apache Zeppelin exposes a simple AngularJS z object on the front-end side to expose the same capabilities."
+    }
+    ,
+    
+  
+  
+
+    "setup-security-authentication-nginx": {
+      "title": "HTTP Basic Auth using NGINX",
+      "content"  : "Authentication for NGINXBuild in authentication mechanism is recommended way for authentication. In case of you want authenticate using NGINX and HTTP basic auth, please read this document.HTTP Basic Authentication using NGINXQuote from Wikipedia: NGINX is a web server. It can act as a reverse proxy server for HTTP, HTTPS, SMTP, POP3, and IMAP protocols, as well as a load balancer and an HTTP cache.So you can use NGINX server as proxy server to serve HTTP Basic Authentication as a separate process along with Zeppelin server.Here are instructions how to accomplish the setup NGINX as a front-end authentication server and connect Zeppelin at behind.This instruction based on Ubuntu 14.04 LTS but may work with other OS with few configuration changes.Install NGINX server on your server instanceYou can install NGINX server with same box where zeppelin installed or separate box where it is dedicated to serve as proxy server.$ apt-get install nginxNOTE : On pre 1.3.13 ver
 sion of NGINX, Proxy for Websocket may not fully works. Please use latest version of NGINX. See: NGINX documentation.Setup init script in NGINXIn most cases, NGINX configuration located under /etc/nginx/sites-available. Create your own configuration or add your existing configuration at /etc/nginx/sites-available.$ cd /etc/nginx/sites-available$ touch my-zeppelin-auth-settingNow add this script into my-zeppelin-auth-setting file. You can comment out optional lines If you want serve Zeppelin under regular HTTP 80 Port.upstream zeppelin {    server [YOUR-ZEPPELIN-SERVER-IP]:[YOUR-ZEPPELIN-SERVER-PORT];   # For security, It is highly recommended to make this address/port as non-public accessible}# Zeppelin Websiteserver {    listen [YOUR-ZEPPELIN-WEB-SERVER-PORT];    listen 443 ssl;                                      # optional, to serve HTTPS connection    server_name [YOUR-ZEPPELIN-SERVER-HOST];             # for example: zeppelin.mycompany.com    ssl_certificate [PATH-TO-YOUR-CERT
 -FILE];            # optional, to serve HTTPS connection    ssl_certificate_key [PATH-TO-YOUR-CERT-KEY-FILE];    # optional, to serve HTTPS connection    if ($ssl_protocol = "") {        rewrite ^ https://$host$request_uri? permanent;  # optional, to force use of HTTPS    }    location / {    # For regular websever support        proxy_pass http://zeppelin;        proxy_set_header X-Real-IP $remote_addr;        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;        proxy_set_header Host $http_host;        proxy_set_header X-NginX-Proxy true;        proxy_redirect off;        auth_basic "Restricted";        auth_basic_user_file /etc/nginx/.htpasswd;    }    location /ws {  # For websocket support        proxy_pass http://zeppelin/ws;        proxy_http_version 1.1;        proxy_set_header Upgrade websocket;        proxy_set_header Connection upgrade;        proxy_read_timeout 86400;    }}Then make a symbolic link to this file from /etc/
 nginx/sites-enabled/ to enable configuration above when NGINX reloads.$ ln -s /etc/nginx/sites-enabled/my-zeppelin-auth-setting /etc/nginx/sites-available/my-zeppelin-auth-settingSetup user credential into .htpasswd file and restart serverNow you need to setup .htpasswd file to serve list of authenticated user credentials for NGINX server.$ cd /etc/nginx$ htpasswd -c htpasswd [YOUR-ID]NEW passwd: [YOUR-PASSWORD]RE-type new passwd: [YOUR-PASSWORD-AGAIN]Or you can use your own apache .htpasswd files in other location for setting up property: auth_basic_user_fileRestart NGINX server.$ service nginx restartThen check HTTP Basic Authentication works in browser. If you can see regular basic auth popup and then able to login with credential you entered into .htpasswd you are good to go.More security considerationUsing HTTPS connection with Basic Authentication is highly recommended since basic auth without encryption may expose your important credential information over the network.Using S
 hiro Security feature built-into Zeppelin is recommended if you prefer all-in-one solution for authentication but NGINX may provides ad-hoc solution for re-use authentication served by your system's NGINX server or in case of you need to separate authentication from zeppelin server.It is recommended to isolate direct connection to Zeppelin server from public internet or external services to secure your zeppelin instance from unexpected attack or problems caused by public zone.Another optionAnother option is to have an authentication server that can verify user credentials in an LDAP server.If an incoming request to the Zeppelin server does not have a cookie with user information encrypted with the authentication server public key, the useris redirected to the authentication server. Once the user is verified, the authentication server redirects the browser to a specific URL in the Zeppelin server which sets the authentication cookie in the browser.The end result is that all r
 equests to the Zeppelin web server have the authentication cookie which contains user and groups information.",
+      "url": " /setup/security/authentication_nginx",
+      "group": "setup/security",
+      "excerpt": "There are multiple ways to enable authentication in Apache Zeppelin. This page describes HTTP basic auth using NGINX."
+    }
+    ,
+    
+  
+
+    "usage-display-system-basic": {
+      "title": "Basic Display System in Apache Zeppelin",
+      "content"  : "Basic Display System in Apache ZeppelinTextBy default, Apache Zeppelin prints interpreter response as a plain text using text display system.You can explicitly say you're using text display system.HtmlWith %html directive, Zeppelin treats your output as HTMLMathematical expressionsHTML display system automatically formats mathematical expression using MathJax. You can use( INLINE EXPRESSION ) and $$ EXPRESSION $$ to format. For exampleTableIf you have data that row separated by n (newline) and column separated by t (tab) with first row as header row, for exampleYou can simply use %table display system to leverage Zeppelin's built in visualization.If table contents start with %html, it is interpreted as an HTML.Note : Display system is backend independent.NetworkWith the %network directive, Zeppelin treats your output as a graph. Zeppelin can leverage the Property Graph Model.What is the Labelled Property Graph Model?A Property Graph is a graph tha
 t has these elements:a set of verticeseach vertex has a unique identifier.each vertex has a set of outgoing edges.each vertex has a set of incoming edges.each vertex has a collection of properties defined by a map from key to valuea set of edgeseach edge has a unique identifier.each edge has an outgoing tail vertex.each edge has an incoming head vertex.each edge has a label that denotes the type of relationship between its two vertices.each edge has a collection of properties defined by a map from key to value.A Labelled Property Graph is a Property Graph where the nodes can be tagged with labels representing their different roles in the graph modelWhat are the APIs?The new NETWORK visualization is based on json with the following params:"nodes" (mandatory): list of nodes of the graph every node can have the following params:"id" (mandatory): the id of the node (must be unique);"label": the main Label of the node;"labels
 ": the list of the labels of the node;"data": the data attached to the node;"edges": list of the edges of the graph;"id" (mandatory): the id of the edge (must be unique);"source" (mandatory): the id of source node of the edge;"target" (mandatory): the id of target node of the edge;"label": the main type of the edge;"data": the data attached to the edge;"labels": a map (K, V) where K is the node label and V is the color of the node;"directed": (true/false, default false) wich tells if is directed graph or not;"types": a distinct list of the edge types of the graphIf you click on a node or edge on the bottom of the paragraph you find a list of entity propertiesThis kind of graph can be easily flatten in order to support other visualization formats provided by Zeppelin.How to use it?An example of a s
 imple graph%sparkprint(s"""%network {    "nodes": [        {"id": 1},        {"id": 2},        {"id": 3}    ],    "edges": [        {"source": 1, "target": 2, "id" : 1},        {"source": 2, "target": 3, "id" : 2},        {"source": 1, "target": 2, "id" : 3},        {"source": 1, "target": 2, "id" : 4},        {"source": 2, "target": 1, "id" : 5},        {"source": 2, "target": 1, "id" : 6}    ]}""")that will look like:A little more complex graph:%sparkprint(s"""%network {    "nodes&q
 uot;: [{"id": 1, "label": "User", "data": {"fullName":"Andrea Santurbano"}},{"id": 2, "label": "User", "data": {"fullName":"Lee Moon Soo"}},{"id": 3, "label": "Project", "data": {"name":"Zeppelin"}}],    "edges": [{"source": 2, "target": 1, "id" : 1, "label": "HELPS"},{"source": 2, "target": 3, "id" : 2, "label": "CREATE"},{"source": 1, "target": 3, "id" : 3, "label": "CONTRIBUTE_TO&quot
 ;, "data": {"oldPR": "https://github.com/apache/zeppelin/pull/1582"}}],    "labels": {"User": "#8BC34A", "Project": "#3071A9"},    "directed": true,    "types": ["HELPS", "CREATE", "CONTRIBUTE_TO"]}""")that will look like:",
+      "url": " /usage/display_system/basic",
+      "group": "usage/display_system",
+      "excerpt": "There are 3 basic display systems in Apache Zeppelin. By default, Zeppelin prints interpreter responce as a plain text using text display system. With %html directive, Zeppelin treats your output as HTML. You can also simply use %table display system..."
+    }
+    ,
+    
+  
+
+    "interpreter-beam": {
+      "title": "Beam interpreter in Apache Zeppelin",
+      "content"  : "Beam interpreter for Apache ZeppelinOverviewApache Beam is an open source unified platform for data processing pipelines. A pipeline can be build using one of the Beam SDKs.The execution of the pipeline is done by different Runners. Currently, Beam supports Apache Flink Runner, Apache Spark Runner, and Google Dataflow Runner.How to useBasically, you can write normal Beam java code where you can determine the Runner. You should write the main method inside a class becuase the interpreter invoke this main to execute the pipeline. Unlike Zeppelin normal pattern, each paragraph is considered as a separate job, there isn't any relation to any other paragraph.The following is a demonstration of a word count example with data represented in array of stringsBut it can read data from files by replacing Create.of(SENTENCES).withCoder(StringUtf8Coder.of()) with TextIO.Read.from("path/to/filename.txt")%beam// most used importsimport org.apache.beam.
 sdk.coders.StringUtf8Coder;import org.apache.beam.sdk.transforms.Create;import java.io.Serializable;import java.util.Arrays;import java.util.List;import java.util.ArrayList;import org.apache.beam.runners.direct.*;import org.apache.beam.sdk.runners.*;import org.apache.beam.sdk.options.*;import org.apache.beam.runners.flink.*;import org.apache.beam.sdk.Pipeline;import org.apache.beam.sdk.io.TextIO;import org.apache.beam.sdk.options.PipelineOptionsFactory;import org.apache.beam.sdk.transforms.Count;import org.apache.beam.sdk.transforms.DoFn;import org.apache.beam.sdk.transforms.MapElements;import org.apache.beam.sdk.transforms.ParDo;import org.apache.beam.sdk.transforms.SimpleFunction;import org.apache.beam.sdk.values.KV;import org.apache.beam.sdk.options.PipelineOptions;public class MinimalWordCount {  static List<String> s = new ArrayList<>();  static final String[] SENTENCES_ARRAY = new String[] {    "Hadoop is the Elephant King!",    &a
 mp;quot;A yellow and elegant thing.",    "He never forgets",    "Useful data, or lets",    "An extraneous element cling!",    "A wonderful king is Hadoop.",    "The elephant plays well with Sqoop.",    "But what helps him to thrive",    "Are Impala, and Hive,",    "And HDFS in the group.",    "Hadoop is an elegant fellow.",    "An elephant gentle and mellow.",    "He never gets mad,",    "Or does anything bad,",    "Because, at his core, he is yellow",    };    static final List<String> SENTENCES = Arrays.asList(SENTENCES_ARRAY);  public static void main(String[] args) {    PipelineOptions options = PipelineOptionsFactory.create().as(PipelineOptions.class);    options.setRunner(FlinkRunner.class);    Pipeline p = Pipeline.create(o
 ptions);    p.apply(Create.of(SENTENCES).withCoder(StringUtf8Coder.of()))         .apply("ExtractWords", ParDo.of(new DoFn<String, String>() {           @ProcessElement           public void processElement(ProcessContext c) {             for (String word : c.element().split("[^a-zA-Z']+")) {               if (!word.isEmpty()) {                 c.output(word);               }             }           }         }))        .apply(Count.<String> perElement())        .apply("FormatResults", ParDo.of(new DoFn<KV<String, Long>, String>() {          @ProcessElement          public void processElement(DoFn<KV<String, Long>, String>.ProcessContext arg0)            throws Exception {            s.add("n" + arg0.element().getKey() + "t" + arg0.element().getValue());            }        }));    p.run();    System.out.
 println("%table wordtcount");    for (int i = 0; i < s.size(); i++) {      System.out.print(s.get(i));    }  }}",
+      "url": " /interpreter/beam",
+      "group": "interpreter",
+      "excerpt": "Apache Beam is an open source, unified programming model that you can use to create a data processing pipeline."
+    }
+    ,
+    
+  
+
+    "interpreter-bigquery": {
+      "title": "BigQuery Interpreter for Apache Zeppelin",
+      "content"  : "BigQuery Interpreter for Apache ZeppelinOverviewBigQuery is a highly scalable no-ops data warehouse in the Google Cloud Platform. Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast SQL queries against append-only tables using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.  Configuration      Name    Default Value    Description        zeppelin.bigquery.project_id          Google Project Id        zeppelin.bigquery.wait_time    5000    Query Timeout in Milliseconds        zeppelin.bigquery.max_no_of_rows    100000    Max result set size        zeppelin.bigquery.sql_dialect        BigQuery SQL dialect (standardSQL or legacySQL
 ). If empty, [query prefix](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix) like '#standardSQL' can be used.  BigQuery APIZeppelin is built against BigQuery API version v2-rev265-1.21.0 - API JavadocsEnabling the BigQuery InterpreterIn a notebook, to enable the BigQuery interpreter, click the Gear icon and select bigquery.Provide Application Default CredentialsWithin Google Cloud Platform (e.g. Google App Engine, Google Compute Engine),built-in credentials are used by default.Outside of GCP, follow the Google API authentication instructions for Zeppelin Google Cloud StorageUsing the BigQuery InterpreterIn a paragraph, use %bigquery.sql to select the BigQuery interpreter and then input SQL statements against your datasets stored in BigQuery.You can use BigQuery SQL Reference to build your own SQL.For Example, SQL to query for top 10 departure delays across airports using the flights public dataset%bigquery.sqlSELECT departure_ai
 rport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays FROM [bigquery-samples:airline_ontime_data.flights] group by departure_airport order by 2 desc limit 10Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery %bigquery.sqlSELECT  package,  COUNT(*) countFROM (  SELECT    REGEXP_EXTRACT(line, r' ([a-z0-9._]*).') package,    id  FROM (    SELECT      SPLIT(content, 'n') line,      id    FROM      [bigquery-public-data:github_repos.sample_contents]    WHERE      content CONTAINS 'import'      AND sample_path LIKE '%.java'    HAVING      LEFT(line, 6)='import' )  GROUP BY    package,    id )GROUP BY  1ORDER BY  count DESCLIMIT  40Technical descriptionFor in-depth technical details on current implementation please refer to bigquery/README.md.",
+      "url": " /interpreter/bigquery",
+      "group": "interpreter",
+      "excerpt": "BigQuery is a highly scalable no-ops data warehouse in the Google Cloud Platform."
+    }
+    ,
+    
+  
+
+    "interpreter-cassandra": {
+      "title": "Cassandra CQL Interpreter for Apache Zeppelin",
+      "content"  : "Cassandra CQL Interpreter for Apache Zeppelin      Name    Class    Description        %cassandra    CassandraInterpreter    Provides interpreter for Apache Cassandra CQL query language  Enabling Cassandra InterpreterIn a notebook, to enable the Cassandra interpreter, click on the Gear icon and select Cassandra  Using the Cassandra InterpreterIn a paragraph, use %cassandra to select the Cassandra interpreter and then input all commands.To access the interactive help, type HELP;    Interpreter CommandsThe Cassandra interpreter accepts the following commands            Command Type      Command Name      Description              Help command      HELP      Display the interactive help menu              Schema commands      DESCRIBE KEYSPACE, DESCRIBE CLUSTER, DESCRIBE TABLES ...      Custom commands to describe the Cassandra schema              Option commands      @consistency, @retryPolicy, @fetchSize ...      Inject runtime options to all statements in the parag
 raph              Prepared statement commands      @prepare, @bind, @remove_prepared      Let you register a prepared command and re-use it later by injecting bound values              Native CQL statements      All CQL-compatible statements (SELECT, INSERT, CREATE, ...)      All CQL statements are executed directly against the Cassandra server      CQL statementsThis interpreter is compatible with any CQL statement supported by Cassandra. Ex:INSERT INTO users(login,name) VALUES('jdoe','John DOE');SELECT * FROM users WHERE login='jdoe';Each statement should be separated by a semi-colon ( ; ) except the special commands below:@prepare@bind@remove_prepare@consistency@serialConsistency@timestamp@retryPolicy@fetchSize@requestTimeOutMulti-line statements as well as multiple statements on the same line are also supported as long as they are separated by a semi-colon. Ex:USE spark_demo;SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM co
 untries LIMIT 1;SELECT *FROM artistsWHERE login='jlennon';Batch statements are supported and can span multiple lines, as well as DDL (CREATE/ALTER/DROP) statements:BEGIN BATCH    INSERT INTO users(login,name) VALUES('jdoe','John DOE');    INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC');APPLY BATCH;CREATE TABLE IF NOT EXISTS test(    key int PRIMARY KEY,    value text);CQL statements are case-insensitive (except for column names and values). This means that the following statements are equivalent and valid:INSERT INTO users(login,name) VALUES('jdoe','John DOE');Insert into users(login,name) vAlues('hsue','Helen SUE');The complete list of all CQL statements and versions can be found below:         Cassandra Version     Documentation Link           3.x                       http://docs.datastax.com/en/cql/3.3/cql/cqlInt
 ro.html                        2.2                       http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html                        2.1 & 2.0                       http://docs.datastax.com/en/cql/3.1/cql/cqlintroc.html                        1.2                       http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html                 Comments in statementsIt is possible to add comments between statements. Single line comments start with the hash sign (#) or double slashes (//). Multi-line comments are enclosed between /** and **/. Ex:#Single line comment style 1INSERT INTO users(login,name) VALUES('jdoe','John DOE');//Single line comment style 2/** Multi line comments **/Insert into users(login,name) vAlues('hsue','Helen SUE');Syntax ValidationThe interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors.All CQL-related syntax validation is delegated directl
 y to CassandraMost of the time, syntax errors are due to missing semi-colons between statements or typo errors.Schema commandsTo make schema discovery easier and more interactive, the following commands are supported:         Command     Description           DESCRIBE CLUSTER;     Show the current cluster name and its partitioner           DESCRIBE KEYSPACES;     List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)           DESCRIBE TABLES;     List all existing keyspaces in the cluster and for each, all the tables name           DESCRIBE TYPES;     List all existing keyspaces in the cluster and for each, all the user-defined types name           DESCRIBE FUNCTIONS;     List all existing keyspaces in the cluster and for each, all the functions name           DESCRIBE AGGREGATES;     List all existing keyspaces in the cluster and for each, all the aggregates name           DESCRIBE MATERIALIZED VIEWS;     List all existing keyspa
 ces in the cluster and for each, all the materialized views name           DESCRIBE KEYSPACE <keyspacename>;     Describe the given keyspace configuration and all its table details (name, columns, ...)           DESCRIBE TABLE (<keyspacename>).<tablename>;             Describe the given table. If the keyspace is not provided, the current logged in keyspace is used.        If there is no logged in keyspace, the default system keyspace is used.        If no table is found, an error message is raised                DESCRIBE TYPE (<keyspacename>).<typename>;             Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used.        If there is no logged in keyspace, the default system keyspace is used.        If no type is found, an error message is raised                DESCRIBE FUNCTION (<keyspacename>).<functionname>;     Describe the given 
 function. If the keyspace is not provided, the current logged in keyspace is used.         If there is no logged in keyspace, the default system keyspace is used.         If no function is found, an error message is raised                DESCRIBE AGGREGATE (<keyspacename>).<aggregatename>;     Describe the given aggregate. If the keyspace is not provided, the current logged in keyspace is used.         If there is no logged in keyspace, the default system keyspace is used.         If no aggregate is found, an error message is raised                DESCRIBE MATERIALIZED VIEW (<keyspacename>).<view_name>;     Describe the given view. If the keyspace is not provided, the current logged in keyspace is used.         If there is no logged in keyspace, the default system keyspace is used.         If no view is found, an error message is raised         The schema objects (cluster, keyspace, table, type, function and aggregate) are disp
 layed in a tabular format.There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend.  Runtime ParametersSometimes you want to be able to pass runtime query parameters to your statements.Those parameters are not part of the CQL specs and are specific to the interpreter.Below is the list of all parameters:         Parameter     Syntax     Description           Consistency Level     @consistency=value     Apply the given consistency level to all queries in the paragraph           Serial Consistency Level     @serialConsistency=value     Apply the given serial consistency level to all queries in the paragraph           Timestamp     @timestamp=long value             Apply the given timestamp to all queries in the paragraph.        Please note that timestamp value passed directly in CQL statement will override this value                 Retry Policy     @retryPolicy=value     Apply the given retry policy to all queries in t
 he paragraph           Fetch Size     @fetchSize=integer value     Apply the given fetch size to all queries in the paragraph           Request Time Out     @requestTimeOut=integer value     Apply the given request timeout in millisecs to all queries in the paragraph    Some parameters only accept restricted values:         Parameter     Possible Values           Consistency Level     ALL, ANY, ONE, TWO, THREE, QUORUM, LOCALONE, LOCALQUORUM, EACHQUORUM           Serial Consistency Level     SERIAL, LOCALSERIAL           Timestamp     Any long value           Retry Policy     DEFAULT, DOWNGRADINGCONSISTENCY, FALLTHROUGH, LOGGINGDEFAULT, LOGGINGDOWNGRADING, LOGGINGFALLTHROUGH           Fetch Size     Any integer value    Please note that you should not add semi-colon ( ; ) at the end of each parameter statementSome examples:CREATE TABLE IF NOT EXISTS spark_demo.ts(    key int PRIMARY KEY,    value text);TRUNCATE spark_demo.ts;// Timestamp in the past@timestamp=10// Force timestamp dir
 ectly in the first insertINSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100;// Select some data to make the clock turnSELECT * FROM spark_demo.albums LIMIT 100;// Now insert using the timestamp parameter set at the beginning(10)INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert');// Check for the result. You should see 'first insert'SELECT value FROM spark_demo.ts WHERE key=1;Some remarks about query parameters:many query parameters can be set in the same paragraphif the same query parameter is set many time with different values, the interpreter only take into account the first valueeach query parameter applies to all CQL statements in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause)the order of each query parameter with regard to CQL statement does not matterSupport for Prepared StatementsFor performance reason, it is better
  to prepare statements before-hand and reuse them later by providing bound values.This interpreter provides 3 commands to handle prepared and bound statements:@prepare@bind@remove_preparedExample:@prepare[statement-name]=...@bind[statement-name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’]@bind[statement-name-with-no-bound-value]@remove_prepare[statement-name]@prepareYou can use the syntax "@prepare[statement-name]=SELECT..." to create a prepared statement.The statement-name is mandatory because the interpreter prepares the given statement with the Java driver andsaves the generated prepared statement in an internal hash map, using the provided statement-name as search key.Please note that this internal prepared statement map is shared with all notebooks and all paragraphs becausethere is only one instance of the interpreter for CassandraIf the interpreter encounters many @prepare for the same
  statement-name (key), only the first statement will be taken into account.Example:@prepare[select]=SELECT * FROM spark_demo.albums LIMIT ?@prepare[select]=SELECT * FROM spark_demo.artists LIMIT ?For the above example, the prepared statement is SELECT * FROM spark_demo.albums LIMIT ?.SELECT * FROM spark_demo.artists LIMIT ? is ignored because an entry already exists in the prepared statements map with the key select.In the context of Zeppelin, a notebook can be scheduled to be executed at regular interval,thus it is necessary to avoid re-preparing many time the same statement (considered an anti-pattern).@bindOnce the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it:@bind[select_first]=10Bound values are not mandatory for the @bind statement. However if you provide bound values, they need to comply to some syntax:String values should be enclosed between simple quotes (')Date values should be enclosed between simple quotes (&#3
 9;) and respect the formats (full list is in the documentation):yyyy-MM-dd HH:MM:ssyyyy-MM-dd HH:MM:ss.SSSnull is parsed as-isboolean (true|false) are parsed as-iscollection values must follow the standard CQL syntax:list: ['listitem1', 'listitem2', ...]set: {'setitem1', 'setitem2', …}map: {'key1': 'val1', 'key2': 'val2', …}tuple values should be enclosed between parenthesis (see Tuple CQL syntax): ('text', 123, true)udt values should be enclosed between brackets (see UDT CQL syntax): {streename: 'Beverly Hills', number: 104, zipcode: 90020, state: 'California', …}It is possible to use the @bind statement inside a batch:BEGIN BATCH   @bind[insert_user]='jdoe','John DOE'   UPDATE users SET age = 27 WHERE login='hsue';APPLY BATCH;@remove_prepareTo av
 oid for a prepared statement to stay forever in the prepared statement map, you can use the@remove_prepare[statement-name] syntax to remove it.Removing a non-existing prepared statement yields no error.Using Dynamic FormsInstead of hard-coding your CQL queries, it is possible to use [Zeppelin dynamic form] syntax to inject simple value or multiple choices forms.The legacy mustache syntax ( {{ }} ) to bind input text and select form is still supported but is deprecated and will be removed in future releases.LegacyThe syntax for simple parameter is: {{input_Label=default value}}. The default value is mandatory because the first time the paragraph is executed,we launch the CQL query before rendering the form so at least one value should be provided.The syntax for multiple choices parameter is: {{input_Label=value1 | value2 | … | valueN }}. By default the first choice is used for CQL querythe first time the paragraph is executed.Example:#Secondary index on performer styleSELECT nam
 e, country, performerFROM spark_demo.performersWHERE name='${performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}'AND styles CONTAINS '${style=Rock}';In the above example, the first CQL query will be executed for performer='Sheryl Crow' AND style='Rock'.For subsequent queries, you can change the value directly using the form.Please note that we enclosed the ${ } block between simple quotes ( ' ) because Cassandra expects a String here.We could have also use the ${style='Rock'} syntax but this time, the value displayed on the form is 'Rock' and not Rock.It is also possible to use dynamic forms for prepared statements:@bind[select]=='${performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}', '${style=Rock}'Shared statesIt is possible to execute many paragraphs in parallel. However, at the back-end side, we're still using synchronous queries.Asynchrono
 us execution is only possible when it is possible to return a Future value in the InterpreterResult.It may be an interesting proposal for the Zeppelin project.Recently, Zeppelin allows you to choose the level of isolation for your interpreters (see [Interpreter Binding Mode] ).Long story short, you have 3 available bindings:shared : same JVM and same Interpreter instance for all notesscoped : same JVM but different Interpreter instances, one for each noteisolated: different JVM running a single Interpreter instance, one JVM for each noteUsing the shared binding, the same com.datastax.driver.core.Session object is used for all notes and paragraphs.Consequently, if you use the USE keyspace_name; statement to log into a keyspace, it will change the keyspace forall current users of the Cassandra interpreter because we only create 1 com.datastax.driver.core.Session objectper instance of Cassandra interpreter.The same remark does apply to the prepared statement hash map, it is shared by a
 ll users using the same instance of Cassandra interpreter.When using scoped binding, in the same JVM Zeppelin will create multiple instances of the Cassandra interpreter, thus multiple com.datastax.driver.core.Session objects. Beware of resource and memory usage using this binding ! The isolated mode is the most extreme and will create as many JVM/com.datastax.driver.core.Session object as there are distinct notes.Interpreter ConfigurationTo configure the Cassandra interpreter, go to the Interpreter menu and scroll down to change the parameters.The Cassandra interpreter is using the official Cassandra Java Driver and most of the parameters are usedto configure the Java driverBelow are the configuration parameters and their default values.        Property Name     Description     Default Value           cassandra.cluster     Name of the Cassandra cluster to connect to     Test Cluster           cassandra.compression.protocol     On wire compression. Possible values are: NONE, SNAPPY,
  LZ4     NONE           cassandra.credentials.username     If security is enable, provide the login     none           cassandra.credentials.password     If security is enable, provide the password     none           cassandra.hosts             Comma separated Cassandra hosts (DNS name or IP address).                Ex: 192.168.0.12,node2,node3           localhost           cassandra.interpreter.parallelism     Number of concurrent paragraphs(queries block) that can be executed     10           cassandra.keyspace             Default keyspace to connect to.                  It is strongly recommended to let the default value          and prefix the table name with the actual keyspace          in all of your queries                  system           cassandra.load.balancing.policy             Load balancing policy. Default = new TokenAwarePolicy(new DCAwareRoundRobinPolicy())        To Specify your own policy, provide the fully qualify class name (FQCN) of your policy.        At runti
 me the interpreter will instantiate the policy using        Class.forName(FQCN)          DEFAULT           cassandra.max.schema.agreement.wait.second     Cassandra max schema agreement wait in second     10           cassandra.pooling.core.connection.per.host.local     Protocol V2 and below default = 2. Protocol V3 and above default = 1     2           cassandra.pooling.core.connection.per.host.remote     Protocol V2 and below default = 1. Protocol V3 and above default = 1     1           cassandra.pooling.heartbeat.interval.seconds     Cassandra pool heartbeat interval in secs     30           cassandra.pooling.idle.timeout.seconds     Cassandra idle time out in seconds     120           cassandra.pooling.max.connection.per.host.local     Protocol V2 and below default = 8. Protocol V3 and above default = 1     8           cassandra.pooling.max.connection.per.host.remote     Protocol V2 and below default = 2. Protocol V3 and above default = 1     2           cassandra.pooling.max.re
 quest.per.connection.local     Protocol V2 and below default = 128. Protocol V3 and above default = 1024     128           cassandra.pooling.max.request.per.connection.remote     Protocol V2 and below default = 128. Protocol V3 and above default = 256     128           cassandra.pooling.new.connection.threshold.local     Protocol V2 and below default = 100. Protocol V3 and above default = 800     100           cassandra.pooling.new.connection.threshold.remote     Protocol V2 and below default = 100. Protocol V3 and above default = 200     100           cassandra.pooling.pool.timeout.millisecs     Cassandra pool time out in millisecs     5000           cassandra.protocol.version     Cassandra binary protocol version     4           cassandra.query.default.consistency           Cassandra query default consistency level            Available values: ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL          ONE           cassandra.query.default.fetchSize     Cassandra q
 uery default fetch size     5000           cassandra.query.default.serial.consistency           Cassandra query default serial consistency level            Available values: SERIAL, LOCAL_SERIAL          SERIAL           cassandra.reconnection.policy             Cassandra Reconnection Policy.        Default = new ExponentialReconnectionPolicy(1000, 10 * 60 * 1000)        To Specify your own policy, provide the fully qualify class name (FQCN) of your policy.        At runtime the interpreter will instantiate the policy using        Class.forName(FQCN)          DEFAULT           cassandra.retry.policy             Cassandra Retry Policy.        Default = DefaultRetryPolicy.INSTANCE        To Specify your own policy, provide the fully qualify class name (FQCN) of your policy.        At runtime the interpreter will instantiate the policy using        Class.forName(FQCN)          DEFAULT           cassandra.socket.connection.timeout.millisecs     Cassandra socket default connection timeou
 t in millisecs     500           cassandra.socket.read.timeout.millisecs     Cassandra socket read timeout in millisecs     12000           cassandra.socket.tcp.no_delay     Cassandra socket TCP no delay     true           cassandra.speculative.execution.policy             Cassandra Speculative Execution Policy.        Default = NoSpeculativeExecutionPolicy.INSTANCE        To Specify your own policy, provide the fully qualify class name (FQCN) of your policy.        At runtime the interpreter will instantiate the policy using        Class.forName(FQCN)          DEFAULT           cassandra.ssl.enabled             Enable support for connecting to the Cassandra configured with SSL.        To connect to Cassandra configured with SSL use true        and provide a truststore file and password with following options.          false           cassandra.ssl.truststore.path             Filepath for the truststore file to use for connection to Cassandra with SSL.                     cassandra.
 ssl.truststore.password             Password for the truststore file to use for connection to Cassandra with SSL.              Change Log3.0 (Zeppelin 0.8.0) :Update documentationUpdate interactive documentationAdd support for binary protocol V4Implement new @requestTimeOut runtime optionUpgrade Java driver version to 3.0.1Allow interpreter to add dynamic forms programmatically when using FormType.SIMPLEAllow dynamic form using default Zeppelin syntaxFixing typo on FallThroughPolicyLook for data in AngularObjectRegistry before creating dynamic formAdd missing support for ALTER statements2.0 (Zeppelin 0.8.0) :Update help menu and add changelogAdd Support for User Defined Functions, User Defined Aggregates and Materialized ViewsUpgrade Java driver version to 3.0.0-rc11.0 (Zeppelin 0.5.5-incubating) :Initial versionBugs & ContactsIf you encounter a bug for this interpreter, please create a JIRA ticket and ping me on Twitter at @doanduyhaiZeppelin Dynamic FormInterpreter Binding
  Mode",
+      "url": " /interpreter/cassandra",
+      "group": "interpreter",
+      "excerpt": "Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance."
+    }
+    ,
+    
+  
+
+    "setup-deployment-cdh": {
+      "title": "Apache Zeppelin on CDH",
+      "content"  : "Apache Zeppelin on CDH1. Import Cloudera QuickStart Docker imageCloudera has officially provided CDH Docker Hub in their own container. Please check this guide page for more information.You can import the Docker image by pulling it from Cloudera Docker Hub.docker pull cloudera/quickstart:latest2. Run dockerdocker run -it  -p 80:80  -p 4040:4040  -p 8020:8020  -p 8022:8022  -p 8030:8030  -p 8032:8032  -p 8033:8033  -p 8040:8040  -p 8042:8042  -p 8088:8088  -p 8480:8480  -p 8485:8485  -p 8888:8888  -p 9083:9083  -p 10020:10020  -p 10033:10033  -p 18088:18088  -p 19888:19888  -p 25000:25000  -p 25010:25010  -p 25020:25020  -p 50010:50010  -p 50020:50020  -p 50070:50070  -p 50075:50075  -h quickstart.cloudera --privileged=true  agitated_payne_backup /usr/bin/docker-quickstart;3. Verify running CDHTo verify the application is running well, check the web UI for HDFS on http://<hostname>:50070/ and YARN on http://<hostname>:8088/cluster.4. Co
 nfigure Spark interpreter in ZeppelinSet following configurations to conf/zeppelin-env.sh.export MASTER=yarn-clientexport HADOOP_CONF_DIR=[your_hadoop_conf_path]export SPARK_HOME=[your_spark_home_path]HADOOP_CONF_DIR(Hadoop configuration path) is defined in /scripts/docker/spark-cluster-managers/cdh/hdfs_conf.Don't forget to set Spark master as yarn-client in Zeppelin Interpreters setting page like below.5. Run Zeppelin with Spark interpreterAfter running a single paragraph with Spark interpreter in Zeppelin,browse http://<hostname>:8088/cluster/apps to check Zeppelin application is running well or not.",
+      "url": " /setup/deployment/cdh",
+      "group": "setup/deployment",
+      "excerpt": "This document will guide you how you can build and configure the environment on CDH with Apache Zeppelin using docker scripts."
+    }
+    ,
+    
+  
+
+    "setup-operation-configuration": {
+      "title": "Apache Zeppelin Configuration",
+      "content"  : "Apache Zeppelin ConfigurationZeppelin PropertiesThere are two locations you can configure Apache Zeppelin.Environment variables can be defined conf/zeppelin-env.sh(confzeppelin-env.cmd for Windows).Java properties can be defined in conf/zeppelin-site.xml.If both are defined, then the environment variables will take priority.Mouse hover on each property and click  then you can get a link for that.      zeppelin-env.sh    zeppelin-site.xml    Default value    Description        ZEPPELIN_ADDR    zeppelin.server.addr    127.0.0.1    Zeppelin server binding address        ZEPPELIN_PORT    zeppelin.server.port    8080    Zeppelin server port        Note: Please make sure you're not using the same port with      Zeppelin web application development port (default: 9000).        ZEPPELIN_SSL_PORT    zeppelin.server.ssl.port    8443    Zeppelin Server ssl port (used when ssl environment/property is set to true)        ZEPPELIN_JMX_ENABLE    N/A        Enable JMX by def
 ining "true"        ZEPPELIN_JMX_PORT    N/A    9996    Port number which JMX uses        ZEPPELIN_MEM    N/A    -Xmx1024m -XX:MaxPermSize=512m    JVM mem options        ZEPPELIN_INTP_MEM    N/A    ZEPPELIN_MEM    JVM mem options for interpreter process        ZEPPELIN_JAVA_OPTS    N/A        JVM options        ZEPPELIN_ALLOWED_ORIGINS    zeppelin.server.allowed.origins    *    Enables a way to specify a ',' separated list of allowed origins for REST and websockets.  e.g. http://localhost:8080        ZEPPELIN_CREDENTIALS_PERSIST    zeppelin.credentials.persist    true    Persist credentials on a JSON file (credentials.json)          ZEPPELIN_CREDENTIALS_ENCRYPT_KEY    zeppelin.credentials.encryptKey        If provided, encrypt passwords on the credentials.json file (passwords will be stored as plain-text otherwise          N/A    zeppelin.anonymous.allowed    true    The anonymous user is allowed by default.        ZEPPELIN_SERVER_CONTEXT_PATH    zeppelin.server.co
 ntext.path    /    Context path of the web application        ZEPPELIN_SSL    zeppelin.ssl    false            ZEPPELIN_SSL_CLIENT_AUTH    zeppelin.ssl.client.auth    false            ZEPPELIN_SSL_KEYSTORE_PATH    zeppelin.ssl.keystore.path    keystore            ZEPPELIN_SSL_KEYSTORE_TYPE    zeppelin.ssl.keystore.type    JKS            ZEPPELIN_SSL_KEYSTORE_PASSWORD    zeppelin.ssl.keystore.password                ZEPPELIN_SSL_KEY_MANAGER_PASSWORD    zeppelin.ssl.key.manager.password                ZEPPELIN_SSL_TRUSTSTORE_PATH    zeppelin.ssl.truststore.path                ZEPPELIN_SSL_TRUSTSTORE_TYPE    zeppelin.ssl.truststore.type                ZEPPELIN_SSL_TRUSTSTORE_PASSWORD    zeppelin.ssl.truststore.password                ZEPPELIN_NOTEBOOK_HOMESCREEN    zeppelin.notebook.homescreen        Display note IDs on the Apache Zeppelin homescreen e.g. 2A94M5J1Z        ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE    zeppelin.notebook.homescreen.hide    false    Hide the note ID set by ZEPPELIN
 _NOTEBOOK_HOMESCREEN on the Apache Zeppelin homescreen. For the further information, please read Customize your Zeppelin homepage.        ZEPPELIN_WAR_TEMPDIR    zeppelin.war.tempdir    webapps    Location of the jetty temporary directory        ZEPPELIN_NOTEBOOK_DIR    zeppelin.notebook.dir    notebook    The root directory where notebook directories are saved        ZEPPELIN_NOTEBOOK_S3_BUCKET    zeppelin.notebook.s3.bucket    zeppelin    S3 Bucket where notebook files will be saved        ZEPPELIN_NOTEBOOK_S3_USER    zeppelin.notebook.s3.user    user    User name of an S3 buckete.g. bucket/user/notebook/2A94M5J1Z/note.json        ZEPPELIN_NOTEBOOK_S3_ENDPOINT    zeppelin.notebook.s3.endpoint    s3.amazonaws.com    Endpoint for the bucket        ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID    zeppelin.notebook.s3.kmsKeyID        AWS KMS Key ID to use for encrypting data in S3 (optional)        ZEPPELIN_NOTEBOOK_S3_EMP    zeppelin.notebook.s3.encryptionMaterialsProvider        Class name of a c
 ustom S3 encryption materials provider implementation to use for encrypting data in S3 (optional)        ZEPPELIN_NOTEBOOK_S3_SSE    zeppelin.notebook.s3.sse    false    Save notebooks to S3 with server-side encryption enabled        ZEPPELIN_NOTEBOOK_S3_SIGNEROVERRIDE    zeppelin.notebook.s3.signerOverride        Optional override to control which signature algorithm should be used to sign AWS requests        ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING    zeppelin.notebook.azure.connectionString        The Azure storage account connection stringe.g. DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>        ZEPPELIN_NOTEBOOK_AZURE_SHARE    zeppelin.notebook.azure.share    zeppelin    Azure Share where the notebook files will be saved        ZEPPELIN_NOTEBOOK_AZURE_USER    zeppelin.notebook.azure.user    user    Optional user name of an Azure file sharee.g. share/user/notebook/2A94M5J1Z/note.json        ZEPPELIN_NOTEBOOK_STORAGE
     zeppelin.notebook.storage    org.apache.zeppelin.notebook.repo.GitNotebookRepo    Comma separated list of notebook storage locations        ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC    zeppelin.notebook.one.way.sync    false    If there are multiple notebook storage locations, should we treat the first one as the only source of truth?        ZEPPELIN_NOTEBOOK_PUBLIC    zeppelin.notebook.public    true    Make notebook public (set only owners) by default when created/imported. If set to false will add user to readers and writers as well, making it private and invisible to other users unless permissions are granted.        ZEPPELIN_INTERPRETERS    zeppelin.interpreters      org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,    ...              Comma separated interpreter configurations [Cl
 ass]       NOTE: This property is deprecated since Zeppelin-0.6.0 and will not be supported from Zeppelin-0.7.0.            ZEPPELIN_INTERPRETER_DIR    zeppelin.interpreter.dir    interpreter    Interpreter directory        ZEPPELIN_INTERPRETER_DEP_MVNREPO    zeppelin.interpreter.dep.mvnRepo    http://repo1.maven.org/maven2/    Remote principal repository for interpreter's additional dependency loading        ZEPPELIN_INTERPRETER_OUTPUT_LIMIT    zeppelin.interpreter.output.limit    102400    Output message from interpreter exceeding the limit will be truncated        ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT    zeppelin.interpreter.connect.timeout    30000    Output message from interpreter exceeding the limit will be truncated        ZEPPELIN_DEP_LOCALREPO    zeppelin.dep.localrepo    local-repo    Local repository for dependency loader.ex)visualiztion modules of npm.        ZEPPELIN_HELIUM_NODE_INSTALLER_URL    zeppelin.helium.node.installer.url    https://nodejs.org/dist/    Remot
 e Node installer url for Helium dependency loader        ZEPPELIN_HELIUM_NPM_INSTALLER_URL    zeppelin.helium.npm.installer.url    http://registry.npmjs.org/    Remote Npm installer url for Helium dependency loader        ZEPPELIN_HELIUM_YARNPKG_INSTALLER_URL    zeppelin.helium.yarnpkg.installer.url    https://github.com/yarnpkg/yarn/releases/download/    Remote Yarn package installer url for Helium dependency loader        ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE    zeppelin.websocket.max.text.message.size    1024000    Size(in characters) of the maximum text message that can be received by websocket.        ZEPPELIN_SERVER_DEFAULT_DIR_ALLOWED    zeppelin.server.default.dir.allowed    false    Enable directory listings on server.        ZEPPELIN_NOTEBOOK_GIT_REMOTE_URL    zeppelin.notebook.git.remote.url        GitHub's repository URL. It could be either the HTTP URL or the SSH URL. For example git@github.com:apache/zeppelin.git        ZEPPELIN_NOTEBOOK_GIT_REMOTE_USERNAME    z
 eppelin.notebook.git.remote.username    token    GitHub username. By default it is `token` to use GitHub's API        ZEPPELIN_NOTEBOOK_GIT_REMOTE_ACCESS_TOKEN    zeppelin.notebook.git.remote.access-token    token    GitHub access token to use GitHub's API. If username/password combination is used and not GitHub API, then this value is the password        ZEPPELIN_NOTEBOOK_GIT_REMOTE_ORIGIN    zeppelin.notebook.git.remote.origin    token    GitHub remote name. Default is `origin`  SSL ConfigurationEnabling SSL requires a few configuration changes. First, you need to create certificates and then update necessary configurations to enable server side SSL and/or client side certificate authentication.Creating and configuring the CertificatesInformation how about to generate certificates and a keystore can be found here.A condensed example can be found in the top answer to this StackOverflow post.The keystore holds the private key and certificate on the server end. The trustore h
 olds the trusted client certificates. Be sure that the path and password for these two stores are correctly configured in the password fields below. They can be obfuscated using the Jetty password tool. After Maven pulls in all the dependency to build Zeppelin, one of the Jetty jars contain the Password tool. Invoke this command from the Zeppelin home build directory with the appropriate version, user, and password.java -cp ./zeppelin-server/target/lib/jetty-all-server-<version>.jar org.eclipse.jetty.util.security.Password <user> <password>If you are using a self-signed, a certificate signed by an untrusted CA, or if client authentication is enabled, then the client must have a browser create exceptions for both the normal HTTPS port and WebSocket port. This can by done by trying to establish an HTTPS connection to both ports in a browser (e.g. if the ports are 443 and 8443, then visit https://127.0.0.1:443 and https://127.0.0.1:8443). This 
 step can be skipped if the server certificate is signed by a trusted CA and client auth is disabled.Configuring server side SSLThe following properties needs to be updated in the zeppelin-site.xml in order to enable server side SSL.<property>  <name>zeppelin.server.ssl.port</name>  <value>8443</value>  <description>Server ssl port. (used when ssl property is set to true)</description></property><property>  <name>zeppelin.ssl</name>  <value>true</value>  <description>Should SSL be used by the servers?</description></property><property>  <name>zeppelin.ssl.keystore.path</name>  <value>keystore</value>  <description>Path to keystore relative to Zeppelin configuration directory&lt
 ;/description></property><property>  <name>zeppelin.ssl.keystore.type</name>  <value>JKS</value>  <description>The format of the given keystore (e.g. JKS or PKCS12)</description></property><property>  <name>zeppelin.ssl.keystore.password</name>  <value>change me</value>  <description>Keystore password. Can be obfuscated by the Jetty Password tool</description></property><property>  <name>zeppelin.ssl.key.manager.password</name>  <value>change me</value>  <description>Key Manager password. Defaults to keystore password. Can be obfuscated.</description></property>Enabling client side certificate authenticationThe following properties 
 needs to be updated in the zeppelin-site.xml in order to enable client side certificate authentication.<property>  <name>zeppelin.server.ssl.port</name>  <value>8443</value>  <description>Server ssl port. (used when ssl property is set to true)</description></property><property>  <name>zeppelin.ssl.client.auth</name>  <value>true</value>  <description>Should client authentication be used for SSL connections?</description></property><property>  <name>zeppelin.ssl.truststore.path</name>  <value>truststore</value>  <description>Path to truststore relative to Zeppelin configuration directory. Defaults to the keystore path</description></property><pro
 perty>  <name>zeppelin.ssl.truststore.type</name>  <value>JKS</value>  <description>The format of the given truststore (e.g. JKS or PKCS12). Defaults to the same type as the keystore type</description></property><property>  <name>zeppelin.ssl.truststore.password</name>  <value>change me</value>  <description>Truststore password. Can be obfuscated by the Jetty Password tool. Defaults to the keystore password</description></property>Storing user credentialsIn order to avoid having to re-enter credentials every time you restart/redeploy Zeppelin, you can store the user credentials. Zeppelin supports this via the ZEPPELINCREDENTIALSPERSIST configuration.Please notice that passwords will be stored in plain text by default. To encrypt the passwords, use the ZEPPELINCREDENTIAL
 SENCRYPT_KEY config variable. This will encrypt passwords using the AES-128 algorithm.You can generate an appropriate encryption key any way you'd like - for instance, by using the openssl tool:openssl enc -aes-128-cbc -k secret -P -md sha1Important: storing your encryption key in a configuration file is not advised. Depending on your environment security needs, you may want to consider utilizing a credentials server, storing the ZEPPELINCREDENTIALSENCRYPT_KEY as an OS env variable, or any other approach that would not colocate the encryption key and the encrypted content (the credentials.json file).Obfuscating Passwords using the Jetty Password ToolSecurity best practices advise to not use plain text passwords and Jetty provides a password tool to help obfuscating the passwords used to access the KeyStore and TrustStore.The Password tool documentation can be found here.After using the tool:java -cp $ZEPPELIN_HOME/zeppelin-server/target/lib/jetty-util-9.2.15.v20160210.jar   
        org.eclipse.jetty.util.security.Password           password2016-12-15 10:46:47.931:INFO::main: Logging initialized @101mspasswordOBF:1v2j1uum1xtv1zej1zer1xtn1uvk1v1vMD5:5f4dcc3b5aa765d61d8327deb882cf99update your configuration with the obfuscated password :<property>  <name>zeppelin.ssl.keystore.password</name>  <value>OBF:1v2j1uum1xtv1zej1zer1xtn1uvk1v1v</value>  <description>Keystore password. Can be obfuscated by the Jetty Password tool</description></property>Create GitHub Access TokenWhen using GitHub to track notebooks, one can use GitHub's API for authentication. To create an access token, please use the following link https://github.com/settings/tokens.The value of the access token generated is set in the zeppelin.notebook.git.remote.access-token property.Note: After updating these configurations, Zeppelin server needs to be restarted.",
+      "url": " /setup/operation/configuration",
+      "group": "setup/operation",
+      "excerpt": "This page will guide you to configure Apache Zeppelin using either environment variables or Java properties. Also, you can configure SSL for Zeppelin."
+    }
+    ,
+    
+  
+
+    "usage-rest-api-configuration": {
+      "title": "Apache Zeppelin Configuration REST API",
+      "content"  : "Apache Zeppelin Configuration REST APIOverviewApache Zeppelin provides several REST APIs for interaction and remote activation of zeppelin functionality.All REST APIs are available starting with the following endpoint http://[zeppelin-server]:[zeppelin-port]/api. Note that Apache Zeppelin REST APIs receive or return JSON objects, it is recommended for you to install some JSON viewers such as JSONView.If you work with Apache Zeppelin and find a need for an additional REST API, please file an issue or send us an email.Configuration REST API listList all key/value pair of configurations              Description      This GET method return all key/value pair of configurations on the server.       Note: For security reason, some pairs would not be shown.              URL      http://[zeppelin-server]:[zeppelin-port]/api/configurations/all              Success code      200               Fail code       500                sample JSON response                    {  &amp
 ;quot;status": "OK",  "message": "",  "body": {    "zeppelin.war.tempdir": "webapps",    "zeppelin.notebook.homescreen.hide": "false",    "zeppelin.interpreter.remoterunner": "bin/interpreter.sh",    "zeppelin.notebook.s3.user": "user",    "zeppelin.server.port": "8089",    "zeppelin.dep.localrepo": "local-repo",    "zeppelin.ssl.truststore.type": "JKS",    "zeppelin.ssl.keystore.path": "keystore",    "zeppelin.notebook.s3.bucket": "zeppelin",    "zeppelin.server.addr": "0.0.0.0",    "zeppelin.ssl.client.auth": "false&q
 uot;,    "zeppelin.server.context.path": "/",    "zeppelin.ssl.keystore.type": "JKS",    "zeppelin.ssl.truststore.path": "truststore",    "zeppelin.interpreters": "org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkRInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.
 ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInterpreter",    "zeppelin.ssl": "false",    "zeppelin.notebook.autoInterpreterBinding": "true",    "zeppelin.notebook.homescreen": "",    "zeppelin.notebook.storage": "org.apache.zeppelin.notebook.repo.VFSNotebookRepo",    "zeppelin.interpreter.connect.timeout": "30000",    "zeppelin.anonymous.allowed": "true",    "zeppelin.server.allowed.origins":"*",    "zeppelin.encoding": "UTF-8"  }}      List all prefix matched key/value pair of configurations              Description      This GET method return all prefix matched key/value pair of configurations on the server.      Note: For security reason, some pairs would not be shown.  
             URL      http://[zeppelin-server]:[zeppelin-port]/api/configurations/prefix/[prefix]              Success code      200               Fail code       500                sample JSON response            {  "status": "OK",  "message": "",  "body": {    "zeppelin.ssl.keystore.type": "JKS",    "zeppelin.ssl.truststore.path": "truststore",    "zeppelin.ssl.truststore.type": "JKS",    "zeppelin.ssl.keystore.path": "keystore",    "zeppelin.ssl": "false",    "zeppelin.ssl.client.auth": "false"  }}            ",
+      "url": " /usage/rest_api/configuration",
+      "group": "usage/rest_api",
+      "excerpt": "This page contains Apache Zeppelin Configuration REST API information."
+    }
+    ,
+    
+  
+
+    "usage-rest-api-credential": {
+      "title": "Apache Zeppelin Credential REST API",
+      "content"  : "Apache Zeppelin Credential REST APIOverviewApache Zeppelin provides several REST APIs for interaction and remote activation of zeppelin functionality.All REST APIs are available starting with the following endpoint http://[zeppelin-server]:[zeppelin-port]/api. Note that Apache Zeppelin REST APIs receive or return JSON objects, it is recommended for you to install some JSON viewers such as JSONView.If you work with Apache Zeppelin and find a need for an additional REST API, please file an issue or send us an email.Credential REST API ListList Credential information              Description      This GET method returns all key/value pairs of the credential information on the server.              URL      http://[zeppelin-server]:[zeppelin-port]/api/credential              Success code      200               Fail code       500                sample JSON response                    {  "status": "OK",  "message": 
 "",  "body": {    "userCredentials":{      "entity1":{        "username":"user1",        "password":"password1"      },      "entity2":{        "username":"user2",        "password":"password2"      }    }  }}      Create an Credential Information              Description      This PUT method creates the credential information with new properties.              URL      http://[zeppelin-server]:[zeppelin-port]/api/credential/              Success code      200              Fail code       500               Sample JSON input              {  "entity": "e1",  "username": "user",  "password": "password"}                            Sample JSON response 
              {  "status": "OK"}                    Delete all Credential Information              Description      This DELETE method deletes the credential information.              URL      http://[zeppelin-server]:[zeppelin-port]/api/credential              Success code      200               Fail code       500               Sample JSON response              {"status":"OK"}            Delete an Credential entity              Description      This DELETE method deletes a given credential entity.              URL      http://[zeppelin-server]:[zeppelin-port]/api/credential/[entity]              Success code      200               Fail code       500               Sample JSON response              {"status":"OK"}            ",
+      "url": " /usage/rest_api/credential",
+      "group": "usage/rest_api",
+      "excerpt": "This page contains Apache Zeppelin Credential REST API information."
+    }
+    ,
+    
+  
+
+    "usage-other-features-cron-scheduler": {
+      "title": "Running a Notebook on a Given Schedule Automatically",
+      "content"  : "Running a Notebook on a Given Schedule AutomaticallyApache Zeppelin provides a cron scheduler for each notebook. You can run a notebook on a given schedule automatically by setting up a cron scheduler on the notebook.Setting up a cron scheduler on a notebookClick the clock icon on the tool bar and open a cron scheduler dialog box.There are the following items which you can input or set:PresetYou can set a cron schedule easily by clicking each option such as 1m and 5m. The login user is set as a cron executing user automatically. You can also clear the cron schedule settings by clicking None.Cron expressionYou can set the cron schedule by filling in this form. Please see Cron Trigger Tutorial for the available cron syntax.Cron executing user (It is removed from 0.8 where it enforces the cron execution user to be the note owner for security purpose)You can set the cron executing user by filling in this form and press the enter key.After execution stop the interpret
 erWhen this checkbox is set to "on", the interpreters which are binded to the notebook are stopped automatically after the cron execution. This feature is useful if you want to release the interpreter resources after the cron execution.Note: A cron execution is skipped if one of the paragraphs is in a state of RUNNING or PENDING no matter whether it is executed automatically (i.e. by the cron scheduler) or manually by a user opening this notebook.Enable cronSet property zeppelin.notebook.cron.enable to true in $ZEPPELIN_HOME/conf/zeppelin-site.xml to enable Cron feature.Run cron selectively on foldersIn $ZEPPELIN_HOME/conf/zeppelin-site.xml make sure the property zeppelin.notebook.cron.enable is set to true, and then set property zeppelin.notebook.cron.folders to the desired folder as comma-separated values, e.g. *yst*, Sys?em, System. This property accepts wildcard and joker.",
+      "url": " /usage/other_features/cron_scheduler",
+      "group": "usage/other_features",
+      "excerpt": "You can run a notebook on a given schedule automatically by setting up a cron scheduler on the notebook."
+    }
+    ,
+    
+  
+
+    "usage-other-features-customizing-homepage": {
+      "title": "Customizing Apache Zeppelin homepage",

[... 777 lines stripped ...]


Mime
View raw message