Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0AD9D200BB3 for ; Wed, 2 Nov 2016 19:07:32 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 096DE160B0A; Wed, 2 Nov 2016 18:07:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 02827160AF0 for ; Wed, 2 Nov 2016 19:07:29 +0100 (CET) Received: (qmail 10496 invoked by uid 500); 2 Nov 2016 18:07:29 -0000 Mailing-List: contact commits-help@predictionio.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@predictionio.incubator.apache.org Delivered-To: mailing list commits@predictionio.incubator.apache.org Received: (qmail 10440 invoked by uid 99); 2 Nov 2016 18:07:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2016 18:07:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 070E01A7B14 for ; Wed, 2 Nov 2016 18:07:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.52 X-Spam-Level: X-Spam-Status: No, score=-3.52 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, MANY_SPAN_IN_TEXT=2.699, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id VmvIY6ia2DEx for ; Wed, 2 Nov 2016 18:07:10 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 19DDB5FC6B for ; Wed, 2 Nov 2016 18:07:07 +0000 (UTC) Received: (qmail 8939 invoked by uid 99); 2 Nov 2016 18:07:07 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2016 18:07:07 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 2AE45E5CE1; Wed, 2 Nov 2016 18:07:07 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: donald@apache.org To: commits@predictionio.incubator.apache.org Date: Wed, 02 Nov 2016 18:07:11 -0000 Message-Id: <58217e784adb418fa6c105bd523c4beb@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [06/51] [abbrv] [partial] incubator-predictionio-site git commit: Documentation based on apache/incubator-predictionio#03e99814384134331cce558c9d89d93f9a7df347 archived-at: Wed, 02 Nov 2016 18:07:32 -0000 http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/25938169/templates/leadscoring/quickstart/index.html ---------------------------------------------------------------------- diff --git a/templates/leadscoring/quickstart/index.html b/templates/leadscoring/quickstart/index.html new file mode 100644 index 0000000..414561c --- /dev/null +++ b/templates/leadscoring/quickstart/index.html @@ -0,0 +1,487 @@ +Quick Start - Lead Scoring Engine Template

Overview

This engine template predicts the probability of an user will convert (conversion event by user) in the current session.

This template requires PredictionIO version >= 0.9.0

Usage

Event Data Requirements

By default, the template requires the following events to be collected:

  • 'page view' events with session ID
  • the first page view event can optionally provide the browser and referrer ID
  • user 'buy' event with session ID

The landing page ID, referrer ID, browser information and user's buy event will be used to train the model.

You can customize what the "conversion" event is. It's "buy" item event by default but it can be modified to others such as "subscribe".

Input Query< /h3>
  • Landing page ID
  • Referrer ID
  • Browser

Output PredictedResult

  • score

1. Install and Run PredictionIO

First you need to install PredictionIO 0.10.0-incubating (if you haven't done it).

Let's say you have installed PredictionIO at /home/yourname/PredictionIO/. For convenience, add PredictionIO's binary command path to your PATH, i.e. /home/yourname/PredictionIO/bin:

1
$ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH
+

If you launched PredictionIO AWS instance, the path is located at /opt/PredictionIO/bin.

Once you have completed the installation process, please make sure all the components (PredictionIO Event Server, Elasticsearch, and HBase) are up and running.

If you launched PredictionIO AWS instance, you can skip pio-start-all. All components should have been started automatically.

If you are using PostgreSQL or MySQL, run the following to start PredictionIO Event Server:

1
$ pio eventserver &
+

If instead you are running HBase and Elasticsearch, run the following to start all PredictionIO Event Server, HBase, and Elasticsearch:

1
$ pio-start-all
+

You can check the status by running:

1
$ pio status
+

If everything is OK, you should see the following outputs:

1
+2
+3
+4
...
+
+(sleeping 5 seconds for all messages to show up...)
+Your system is all ready to go.
+

To further troubleshoot, please see FAQ - Using PredictionIO.

2. Create a new Engine from an Engine Template

Now let's create a new engine called MyLeadScoring by downloading the Lead Scoring Engine Template. Go to a directory where you want to put your engine and run the following:

1
+2
$ pio template get PredictionIO/template-scala-parallel-leadscoring MyLeadScoring
+$ cd MyLeadScoring
+

A new directory MyLeadScoring is created, where you can find the downloaded engine template.

3. Generate an App ID and Access Key

You will need to create a new App in PredictionIO to store all the data of your app. The data collected will be used for machine learning modeling.

Let's assume you want to use this engine in an application named "MyApp1". Run the following to create a new app "MyApp1":

1
$ pio app new MyApp1
+

You should find the following in the console output:

1
+2
+3
+4
+5
+6
...
+[INFO] [App$] Initialized Event Store for this app ID: 1.
+[INFO] [App$] Created new app:
+[INFO] [App$]       Name: MyApp1
+[INFO] [App$]         ID: 1
+[INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F
+

Note that App ID, **Access Key* are created for this App "MyApp1". You will need the Access Key when you collect data with EventServer for this App.

You can list all of the apps created its corresponding ID and Access Key by running the following command:

1
$ pio app list
+

You should see a list of apps created. For example:

1
+2
+3
+4
[INFO] [App$]                 Name |   ID |                                                       Access Key | Allowed Event(s)
+[INFO] [App$]               MyApp1 |    1 | 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F | (all)
+[INFO] [App$]               MyApp2 |    2 | io5lz6Eg4m3Xe4JZTBFE13GMAf1dhFl6ZteuJfrO84XpdOz9wRCrDU44EUaYuXq5 | (all)
+[INFO] [App$] Finished listing 2 app(s).
+

4. Collecting Data

Next, let's collect training data for this Engine. By default, Lead Scoring Engine Template supports the following entities: user, page, and item. An user views a page, and buys an item.

Note that a "sessionId" property is required to indicate these events happen in the same session. In the first visit of a user, you should specify the optional "referrral ID" and "browser" information. These are used to determine where the user comes from and the browser information.

In summary, this template requires user-view-page event and user-buy-item events with the session ID, referrer ID and browser properties.

You can send these events to PredictionIO Event Server in real-time easily by making a HTTP request or through the provided SDK. Please see App Integration Overview for more details how to integrate your app with SDK.

Let's try sending events to EventServer with the following curl commands (The corresponding SDK code is showed in other tabs).

Replace <ACCCESS_KEY> by the Access Key generated in above steps. Note that localhost:7070 is the default URL of the Event Server.

For convenience, set your access key to the shell variable, run:

$ ACCESS_KEY=<ACCESS_KEY>

For example, when an user with ID u0 views a URL page "example.com/page0" on time 2014-11-02T09:39:45.618-08:00, with session ID "akdj230fj8ass" (current time will be used if eventTime is not specified) you can send the event to Event Server. Run the following curl command:

1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
+-H "Content-Type: application/json" \
+-d '{
+  "event" : "view",
+  "entityType" : "user",
+  "entityId" : "u0",
+  "targetEntityType" : "page",
+  "targetEntityId" : "example.com/page0",
+  "properties" : {
+    "sessionId" : "akdj230fj8ass",
+    "referrerId" : "referrer0.com",
+    "browser" : "Firefox"
+  }
+  "eventTime" : "2014-11-02T09:39:45.618-08:00"
+}'
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
import predictionio
+
+client = predictionio.EventClient(
+  access_key=<ACCESS KEY>,
+  url=<URL OF EVENTSERVER>,
+  threads=5,
+  qsize=500
+)
+
+# A user views a page
+
+client.create_event(
+  event="view",
+  entity_type="user",
+  entity_id=<USER ID>,
+  target_entity_type="page",
+  target_entity_id=<PAGE ID>,
+  properties = {
+    "sessionId": <SESSION ID>, # required
+    "referrerId": <REFERRER ID>, # optinal. but should specify this if you have this information when user views the landing page
+    "browser": <BROWSER> # optinal. but should specify this if you have this information when user views the landing page
+  }
+)
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
<?php
+require_once("vendor/autoload.php");
+use predictionio\EventClient;
+
+$client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);
+
+// A user views a page
+$client->createEvent(array(
+  'event' => 'view',
+  'entityType' => 'user',
+  'entityId' => <USER ID>,
+  'targetEntityType' => 'page',
+  'targetEntityId' => <PAGE ID>,
+  'properties' => array(
+    'sessionId' => <SESSION ID>,
+    'referrerId' => <REFERRER ID>,
+    'browser' => <BROWSER>
+  )
+));
+?>
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
# Create a client object.
+client = PredictionIO::EventClient.new(<ACCESS KEY>, <URL OF EVENTSERVER>)
+
+# A user views a page.
+client.create_event(
+  'view',
+  'user',
+  <USER ID>, {
+    'targetEntityType' => 'page',
+    'targetEntityId' => <PAGE ID>,
+    'properties' => {
+      'sessionId' => <SESSION ID>,
+      'referrerId' => <REFERRER ID>,
+      'browser' => <BROWSER>
+    }
+  }
+)
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
import org.apache.predictionio.Event;
+import org.apache.predictionio.EventClient;
+
+import com.google.common.collect.ImmutableList;
+
+EventClient client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);
+
+// A user views a page
+Event viewEvent = new Event()
+    .event("view")
+    .entityType("user")
+    .entityId(<USER_ID>)
+    .targetEntityType("page")
+    .targetEntityId(<PAGE_ID>);
+    .property("sessionId", "<SESSION ID>")
+    .property("referrerId", "<REFERRER ID>")
+    .property("browser", "<BROWSER>");
+client.createEvent(viewEvent);
+

In the same browing session "akdj230fj8ass", the user with ID u0 buys an item i0 on time 2014-11-02T09:42:00.123-08:00 (current time will be used if eventTime is not specified), you can send the following buy event. Run the following curl command:

1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
+-H "Content-Type: application/json" \
+-d '{
+  "event" : "buy",
+  "entityType" : "user",
+  "entityId" : "u0",
+  "targetEntityType" : "item",
+  "targetEntityId" : "i0",
+  "properties" : {
+    "sessionId" : "akdj230fj8ass"
+  }
+  "eventTime" : "2014-11-02T09:42:00.123-08:00"
+}'
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
# A user buys an item
+
+client.create_event(
+  event="buy",
+  entity_type="user",
+  entity_id=<USER ID>,
+  target_entity_type="item",
+  target_entity_id=<ITEM ID>,
+  properties = {
+    "sessionId": <SESSION ID>, # required
+  }
+)
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
<?php
+// A user buys an item
+$client->createEvent(array(
+  'event' => 'buy',
+  'entityType' => 'user',
+  'entityId' => <USER ID>,
+  'targetEntityType' => 'item',
+  'targetEntityId' => <ITEM ID>,
+  'properties' => array(
+    'sessionId' => <SESSION ID>
+  )
+));
+
+?>
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
# A user buys an item.
+client.create_event(
+  'buy',
+  'user',
+  <USER ID>, {
+    'targetEntityType' => 'item',
+    'targetEntityId' => <ITEM ID>,
+    'properties' => {
+      'sessionId' => <SESSION ID>
+    }
+  }
+)
+
1
+2
+3
+4
+5
+6
+7
+8
+9
// A user buys an item
+Event buyEvent = new Event()
+    .event("buy")
+    .entityType("user")
+    .entityId(<USER_ID>)
+    .targetEntityType("item")
+    .targetEntityId(<ITEM_ID>)
+    .property("sessionId", "<SESSION ID>");
+client.createEvent(buyEvent);
+

Query Event Server

Now let's query the EventServer and see if these events are imported successfully.

Go to following URL with your browser:

http://localhost:7070/events.json?accessKey=&lt;YOUR_ACCESS_KEY>

or run the following command in terminal:

1
$ curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"
+

Note that you should quote the entire URL by using single or double quotes when you run the curl command.

It should return the imported events in JSON format. You can refer to Event Server Debugging Recipes for more different ways to query Event Server.

Import More Sample Data

This engine requires more data in order to train a useful model. Instead of sending more events one by one in real time, for quickstart demonstration purpose, we are going to use a script to import more events in batch.

A Python import script import_eventserver.py is provided to import sample data. The sample data includes 50 sessions of events. In each session, a randomly selected user (with user ID "u1" to "u10") lands on a page (randomly select ed from example.com/page1 to example.com/page20) with referrerId (randomly selected from referrer1.com to referrer10.com) and browser information. The user may view more pages, and may or may not buy an item (with item ID "i1" to "i50").

First, you will need to install Python SDK in order to run the sample data import script. To install Python SDK, run:

1
$ pip install predictionio
+

or

1
$ easy_install predictionio
+

You may need sudo access if you have permission issue. (ie. sudo pip install predictionio)

Make sure you are under the MyLeadScoring directory. Execute the following to import the data (Replace the value of access_key parameter with your Access Key):

1
+2
$ cd MyLeadScoring
+$ python data/import_eventserver.py --access_key 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F
+

You should see the following output:

1
+2
+3
+4
+5
+6
+7
+8
...
+User u8 buys item i13
+session c347980abdf711e4b135b8e8560679ba
+User u5 lands on page example.com/page11 referrer referrer4.com browser Firefox
+User u5 views page example.com/page8
+User u5 views page example.com/page17
+User u5 buys item i5
+166 events are imported.
+

If you see error TypeError: init() got an unexpected keyword argument 'access_key', please update the Python SDK to the latest version.

You can query the event server again as described previously to check the imported events.

5. Deploy the Engine as a Service

Now you can build, train, and deploy the engine. First, make sure you are under the MyLeadScoring directory.

1
$ cd MyLeadScoring
+

Engine.json

Under the directory, you should find an engine.json file; this is where you specify parameters for the engine.

Modify this file to make sure the appName parameter match your App Name you created earlier (e.g. "MyApp1" if you follow the quickstart).

1
+2
+3
+4
+5
+6
+7
  ...
+  "datasource": {
+    "params" : {
+      "appName": "MyApp1"
+    }
+  },
+  ...
+

You may see appId in engine.json instead, which means you are using old template. In this case, make sure the appId defined in the file match your App ID. Alternatively, you can download the latest version of the template or follow our upgrade instructions to modify the template to use appName as parameter.

Building

Start with building your MyLeadScoring engine. Run the following command:

1
$ pio build --verbose
+

This command should take few minutes for the first time; all subsequent builds should be less than a minute. You can also run it without --verbose if you don't want to see all the log messages.

Upon successful build, you should see a console message similar to the following.

1
[INFO] [Console$] Your engine is ready for training.
+

Training the Predictive Model

To train your engine, run the following command:

1
$ pio train
+

When your engine is trained successfully, you should see a console message similar to the following.

1
[INFO] [CoreWorkflow$] Training completed successfully.
+

Deploying the Engine

Now your engine is ready to deploy. Run:

1
$ pio deploy
+

When the engine is deployed successfully and running, you should see a console message similar to the following:

1
+2
[INFO] [HttpListener] Bound to /0.0.0.0:8000
+[INFO] [MasterActor] Bind successful. Ready to serve.
+

Do not kill the deployed engine process.

By default, the deployed engine binds to http://localhost:8000. You can visit that page in your web browser to check its status.

Engine Status

6. Use the Engine

Now, You can retrieve the results. When a user lands on your page "example.com/page9", with referrer "referrer10.com" and browser "Firefox", you can get the predicted lead score by sending this JSON '{ "landingPageId" : "example.com/page9", "referrerId" : "referrer10.com", "browser": "Firefox" }' to the deployed engine. The engine will return a JSON with the score.

Simply send a query by making a HTTP request or through the EngineClient of an SDK.

With the deployed engine running, open another temrinal and run the following curl command or use SDK to send the query:

1
+2
+3
+4
+5
+6
+7
$ curl -H "Content-Type: application/json" \
+-d '{
+  "landingPageId" : "example.com/page9",
+  "referrerId" : "referrer10.com",
+  "browser": "Firefox" }' \
+http://localhost:8000/queries.json
+
+
1
+2
+3
+4
+5
+6
+7
import predictionio
+engine_client = predictionio.EngineClient(url="http://localhost:8000")
+print engine_client.send_query({
+  "landingPageId" : "example.com/page9",
+  "referrerId" : "referrer10.com",
+  "browser": "Firefox"
+})
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
<?php
+require_once("vendor/autoload.php");
+use predictionio\EngineClient;
+
+$client = new EngineClient('http://localhost:8000');
+
+$response = $client->sendQuery(array(
+  'landingPageId' => 'example.com/page9',
+  'referrerId' => 'referrer10.com',
+  'browser' => 'Firefox'
+));
+
+print_r($response);
+
+?>
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
# Create client object.
+client = PredictionIO::EngineClient.new('http://localhost:8000')
+
+# Query PredictionIO.
+response = client.send_query(
+  'landingPageId' => 'example.com/page9',
+  'referrerId' => 'referrer10.com',
+  'browser' => 'Firefox'
+)
+
+puts response
+
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableList;
+import com.google.gson.JsonObject;
+
+import org.apache.predictionio.EngineClient;
+
+// create client object
+EngineClient engineClient = new EngineClient("http://localhost:8000");
+
+// query
+
+JsonObject response = engineClient.sendQuery(ImmutableMap.<String, Object>of(
+  "landingPageId", "example.com/page9",
+  "referrerId", "referrer10.com",
+  "browser", "Firefox"
+));
+

The following is sample JSON response:

1
{"score":0.7466666666666667}
+

MyLeadScoring is now running.

To update the model periodically with new data, simply set up a cron job to call pio train and pio deploy. The engine will continue to serve prediction results during the re-train process. After the training is completed, pio deploy will automatically shutdown the existing engine server and bring up a new process on the same port.

Note that if you import a large data set and the training seems to be taking forever or getting stuck, it's likely that there is not enough executor memory. It's recommended to setup a Spark standalone cluster, you'll need to specify more driver and executor memory when training with a large data set. Please see FAQ here for instructions.

Next: DASE Components Explained

\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/25938169/templates/leadscoring/quickstart/index.html.gz ---------------------------------------------------------------------- diff --git a/templates/leadscoring/quickstart/index.html.gz b/templates/leadscoring/quickstart/index.html.gz new file mode 100644 index 0000000..adbbec1 Binary files /dev/null and b/templates/leadscoring/quickstart/index.html.gz differ