Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C4A5B200C16 for ; Thu, 9 Feb 2017 11:04:54 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C3419160B64; Thu, 9 Feb 2017 10:04:54 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C4983160B50 for ; Thu, 9 Feb 2017 11:04:52 +0100 (CET) Received: (qmail 28931 invoked by uid 500); 9 Feb 2017 10:04:52 -0000 Mailing-List: contact commits-help@predictionio.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@predictionio.incubator.apache.org Delivered-To: mailing list commits@predictionio.incubator.apache.org Received: (qmail 28922 invoked by uid 99); 9 Feb 2017 10:04:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 10:04:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 669A9C31AC for ; Thu, 9 Feb 2017 10:04:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.218 X-Spam-Level: X-Spam-Status: No, score=-6.218 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id QAPQkMEpqw5R for ; Thu, 9 Feb 2017 10:04:45 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 1436460E22 for ; Thu, 9 Feb 2017 10:04:31 +0000 (UTC) Received: (qmail 26761 invoked by uid 99); 9 Feb 2017 10:04:31 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 10:04:31 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 30BAFDFEEA; Thu, 9 Feb 2017 10:04:31 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: git-site-role@apache.org To: commits@predictionio.incubator.apache.org Date: Thu, 09 Feb 2017 10:05:14 -0000 Message-Id: <16faa89d9dde4505bd7f9da8cadfcda2@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [45/52] [partial] incubator-predictionio-site git commit: Clean up before apache/incubator-predictionio# archived-at: Thu, 09 Feb 2017 10:04:54 -0000 http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/5f9e42bc/datacollection/analytics-zeppelin/index.html ---------------------------------------------------------------------- diff --git a/datacollection/analytics-zeppelin/index.html b/datacollection/analytics-zeppelin/index.html deleted file mode 100644 index ad4f1ae..0000000 --- a/datacollection/analytics-zeppelin/index.html +++ /dev/null @@ -1,47 +0,0 @@ -Machine Learning Analytics with Zeppelin

Apache Zeppelin is an interactive computational environment built on Apache Spark like the IPython Notebook. With Apache PredictionIO (incubating) and Spark SQL, you can easily analyze your collected events when you are developing or tuni ng your engine.

Prerequisites

The following instructions assume that you have the command sbt accessible in your shell's search path. Alternatively, you can use the sbt command that comes with Apache PredictionIO (incubating) at $PIO_HOME/sbt/sbt.

Export Events to Apache Parquet

PredictionIO supports exporting your events to Apache Parquet, a columnar storage format that allows you to query quickly.

Let's export the data we imported in Recommendation Engine Template Quick Start, and assume the App ID is 1.

1
$ $PIO_HOME/bin/pio export --appid 1 --output /tmp/movies --format parquet
-

After the command has finished successfully, you should see something similar to the following.

1
-2
-3
-4
-5
-6
-7
-8
-9
-10
-11
root
- |-- creationTime: string (nullable = true)
- |-- entityId: string (nullable = true)
- |-- entityType: string (nullable = true)
- |-- event: string (nullable = true)
- |-- eventId: string (nullable = true)
- |-- eventTime: string (nullable = true)
- |-- properties: struct (nullable = true)
- |    |-- rating: double (nullable = true)
- |-- targetEntityId: string (nullable = true)
- |-- targetEntityType: string (nullable = true)
-

Building Zeppelin for Apache Spark 1.2+

Start by cloning Zeppelin.

1
$ git clone https://github.com/apache/incubator-zeppelin.git
-

Build Zeppelin with Hadoop 2.4 and Spark 1.2 profiles.

1
-2
$ cd zeppelin
-$ mvn clean package -Pspark-1.2 -Dhadoop.version=2.4.0 -Phadoop-2.4 -DskipTests
-

Now you should have working Zeppelin binaries.

Preparing Zeppelin

First, start Zeppelin.

1
$ bin/zeppelin-daemon.sh start
-

By default, you should be able to access Zeppelin via web browser at http://localhost:8080. Create a new notebook and put the following in the first cell.

1
sqlc.parquetFile("/tmp/movies").registerTempTable("events")
-

Preparing Zeppelin

Performing Analysis with Zeppelin

If all steps above ran successfully, you should have a ready-to-use analytics environment by now. Let's try a few examples to see if everything is functional.

In the second cell, put in this piece of code and run it.

1
-2
-3
%sql
-SELECT entityType, event, targetEntityType, COUNT(*) AS c FROM events
-GROUP BY entityType, event, targetEntityType
-

Summary of Events

We can also easily plot a pie chart.

1
-2
%sql
-SELECT event, COUNT(*) AS c FROM events GROUP BY event
-

Summary of Event in Pie Chart

And see a breakdown of rating values.

1
-2
-3
%sql
-SELECT properties.rating AS r, COUNT(*) AS c FROM events
-WHERE properties.rating IS NOT NULL GROUP BY properties.rating ORDER BY r
-

Breakdown of Rating Values

Happy analyzing!

\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/5f9e42bc/datacollection/analytics-zeppelin/index.html.gz ---------------------------------------------------------------------- diff --git a/datacollection/analytics-zeppelin/index.html.gz b/datacollection/analytics-zeppelin/index.html.gz deleted file mode 100644 index f8f8c5e..0000000 Binary files a/datacollection/analytics-zeppelin/index.html.gz and /dev/null differ http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/5f9e42bc/datacollection/analytics/index.html ---------------------------------------------------------------------- diff --git a/datacollection/analytics/index.html b/datacollection/analytics/index.html deleted file mode 100644 index acf1037..0000000 --- a/datacollection/analytics/index.html +++ /dev/null @@ -1,6 +0,0 @@ -Using Analytics Tools

Event Server collects and unifies data for your application from multiple channels.

Data can be exported to Apache parquet format with pio export for fast analysis. The following analytics tools are currently supported:

  1. IPython Notebook

  2. Tableau

  3. Zeppelin

\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/5f9e42bc/datacollection/analytics/index.html.gz ---------------------------------------------------------------------- diff --git a/datacollection/analytics/index.html.gz b/datacollection/analytics/index.html.gz deleted file mode 100644 index f93f462..0000000 Binary files a/datacollection/analytics/index.html.gz and /dev/null differ http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/5f9e42bc/datacollection/batchimport/index.html ---------------------------------------------------------------------- diff --git a/datacollection/batchimport/index.html b/datacollection/batchimport/index.html deleted file mode 100644 index c13d97e..0000000 --- a/datacollection/batchimport/index.html +++ /dev/null @@ -1,68 +0,0 @@ -Importing Data in Batch

If you have a large amount of data to start with, performing batch import will be much faster than sending every event over an HTTP connection.

Preparing Input File

The import tool expects its input to be a file stored either in the local filesystem or on HDFS. Each line of the file should be a JSON object string representing an event. For more information about the format of event JSON object, please refer to this page.

Shown below is an example that contains 5 events ready to be imported to the Event Server.

1
-2
-3
-4
-5
{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"0","eventTime":"2014-11-21T01:04:14.716Z"}
-{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"1","eventTime":"2014-11-21T01:04:14.722Z"}
-{"event":"rate","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"2","properties":{"rating":1.0},"eventTime":"2014-11-21T01:04:14.729Z"}
-{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"7","eventTime":"2014-11-21T01:04:14.735Z"}
-{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"8","eventTime":"2014-11-21T01:04:14.741Z"}
-

Please make sure your import file does not contain any empty lines. Empty lines will be treated as a null object and will return an error during import.

Use SDK to Prepare Batch Input File

Some of the Apache PredictionIO (incubating) SDKs also provides FileExporter client. You may use them to prepare the JSON file as described above. The FileExporter creates event in the same way as EventClient except that the events are written to a JSON file instead of being sent to EventSever. The written JSON file can then be used by batch import.

(coming soon)
1
-2
-3
-4
-5
-6
-7
-8
-9
-10
-11
-12
-13
-14
-15
-16
-17
-18
-19
-20
-21
-22
-23
-24
-25
-26
import predictionio
-from datetime import datetime
-import pytz
-
-# Create a FileExporter and specify "my_events.json" as destination file
-exporter = predictionio.FileExporter(file_name="my_events.json")
-
-event_properties = {
-    "someProperty" : "value1",
-    "anotherProperty" : "value2",
-    }
-# write the events to a file
-event_response = exporter.create_event(
-    event="my_event",
-    entity_type="user",
-    entity_id="uid",
-    target_entity_type="item",
-    target_entity_id="iid",
-    properties=event_properties,
-    event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc))
-
-# ...
-
-# close the FileExporter when finish writing all events
-exporter.close()
-
-
(coming soon)
1
(coming soon)
-

Import Events from Input File

Importing events from a file can be done easily using the command line interface. Assuming that pio be in your search path, your App ID be 123, and the input file my_events.json be in your current working directory:

1
$ pio import --appid 123 --input my_events.json
-

After a brief while, the tool should return to the console without any error. Congratulations! You have successfully imported your events.

\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/5f9e42bc/datacollection/batchimport/index.html.gz ---------------------------------------------------------------------- diff --git a/datacollection/batchimport/index.html.gz b/datacollection/batchimport/index.html.gz deleted file mode 100644 index fd67415..0000000 Binary files a/datacollection/batchimport/index.html.gz and /dev/null differ