Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 65845200D31 for ; Sat, 21 Oct 2017 07:45:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6412D160BED; Sat, 21 Oct 2017 05:45:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5EEF8160BEF for ; Sat, 21 Oct 2017 07:45:31 +0200 (CEST) Received: (qmail 15100 invoked by uid 500); 21 Oct 2017 05:45:30 -0000 Mailing-List: contact commits-help@predictionio.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@predictionio.apache.org Delivered-To: mailing list commits@predictionio.apache.org Received: (qmail 15091 invoked by uid 99); 21 Oct 2017 05:45:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Oct 2017 05:45:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6E711C3EE1 for ; Sat, 21 Oct 2017 05:45:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.222 X-Spam-Level: X-Spam-Status: No, score=-4.222 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id BSTyAfHr5WVf for ; Sat, 21 Oct 2017 05:45:24 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id BF9335FCDA for ; Sat, 21 Oct 2017 05:45:22 +0000 (UTC) Received: (qmail 14811 invoked by uid 99); 21 Oct 2017 05:45:21 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Oct 2017 05:45:21 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5A03DE02B9; Sat, 21 Oct 2017 05:45:19 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: git-site-role@apache.org To: commits@predictionio.incubator.apache.org Date: Sat, 21 Oct 2017 05:45:56 -0000 Message-Id: <3d0efd558a50466dba734fb29e5f0af8@git.apache.org> In-Reply-To: <998cc4beaf834ae3baf16fb834b3a607@git.apache.org> References: <998cc4beaf834ae3baf16fb834b3a607@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [38/51] [partial] incubator-predictionio-site git commit: Documentation based on apache/incubator-predictionio#e6ea7dd2888564ae9b3242647cbf07af6287a3ee archived-at: Sat, 21 Oct 2017 05:45:33 -0000 http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/452034c1/customize/dase/index.html ---------------------------------------------------------------------- diff --git a/customize/dase/index.html b/customize/dase/index.html index 41c106e..2bc07d9 100644 --- a/customize/dase/index.html +++ b/customize/dase/index.html @@ -1,4 +1,4 @@ -Implementing DASE

This section gives you an overview of DASE components and how to implement them. You will find links to some engine templates for more concrete examples.

DataSource

DataSource reads and selects useful data from the Event Store (data store of the Event Server) and returns TrainingData.

readTraining()

You need to implment readTraining() of PDataSource, where you can use the PEventStore Engine API to read the events and create the TrainingData based on the events.

The following code example reads user "view" and "buy" item events, filters specific type of events for future processing and returns TrainingData accordingly.

1
+Implementing DASE

This section gives you an overview of DASE components and how to implement them. You will find links to some engine templates for more concrete examples.

DataSource

DataSource reads and selects useful data from the Event Store (data store of the Event Server) and returns TrainingData.

readTraining()

You need to implment readTraining() of PDataSource, where you can use the PEventStore Engine API to read the events and create the TrainingData based on the events.

The following code example reads user "view" and "buy" item events, filters specific type of events for future processing and returns TrainingData accordingly.

1
 2
 3
 4
@@ -51,7 +51,7 @@
   }
 
 }
-

Using PEventStore Engine API

Please see Event Server Overview to understand EventAPI and event modeling.

With PEventStore Engine API, you can easily read different events in DataSource and get the information you need.

For example, let's say you have events like the following:

1
+

Using PEventStore Engine API

Please see Event Server Overview to understand EventAPI and event modeling.

With PEventStore Engine API, you can easily read different events in DataSource and get the information you need.

For example, let's say you have events like the following:

1
 2
 3
 4
@@ -123,7 +123,7 @@
           throw e
       }
     }
-

If you have used special events $set/$unset/$delete setting entity's properties, you can retrieve it with PEventStore.aggregateProperties().

Please see event modeling to understand usage of special $set/$unset/$delete events.

For example, the following code show how you could retrieve properties of the "item" entities:

1
+

If you have used special events $set/$unset/$delete setting entity's properties, you can retrieve it with PEventStore.aggregateProperties().

Please see event modeling to understand usage of special $set/$unset/$delete events.

For example, the following code show how you could retrieve properties of the "item" entities:

1
 2
 3
 4
@@ -168,7 +168,7 @@
       }
 
     }
-

Example:

Preparator

Preparator is responsible for pre-processing TrainingData for any necessary feature selection and data processing tasks and generate PreparedData which contains the data the Algorithm needs.

A few example usages of Preparator:

  • Feature extraction
  • Common pre-processing logic if you have multiple algorithms
  • For simple cases, the Preparator may simply pass the same TrainingData as PreparedData for Algorithm.

prepare()

You need to implement the prepare() method of PPrepartor to perform such t asks.

Example:

Algorithm

The two methods of the Algorithm class are train() and predict():

train()

train() is responsible for training a predictive model. It is called when you run pio train. Apache PredictionIO (incubating) will store this model.

predict()

predict() is responsible for using this model to make prediction. It is called when you send a JSON query to the engine. Note that predict() is called in real time.

Apache PredictionIO (inc ubating) supports two types of algorithms:

P2LAlgorithm

For P2LAlgorithm, the Model is automatically serialized and persisted by Apache PredictionIO (incubating) after training.

Implementing IPersistentModel and IPersistentModelLoader is optional for P2LAlgorithm.

Example:

PAlgorithm

PAlgorithm should be used w hen your Model contains RDD. The model produced by PAlgorithm is not persisted by default. To persist the model, you need to do the following:

  • The Model class should extend the IPersistentModel trait and implement the save() method for saving the model. The trait IPersistentModel requires a type parameter which is the class type of algorithm parameter.
  • Implement a Model factory object which extends the IPersistentModelLoader trait and implement the apply() for loading the model. The trait IPersistentModelLoader requires two type parameters which are the types of algorithm parameter and the model produced by the algorithm.

Example:

using LEventStore Engine API in predict()

You may use LEventStore.findByEntity() to retrieve events of a specific entity. For example, retrieve recent events of the user specified in the query) and use these recent events to make prediction in real time.

For example, the following code reads the recent 10 view events of query.user:

1
+

Example:

Preparator

Preparator is responsible for pre-processing TrainingData for any necessary feature selection and data processing tasks and generate PreparedData which contains the data the Algorithm needs.

A few example usages of Preparator:

  • Feature extraction
  • Common pre-processing logic if you have multiple algorithms
  • For simple cases, the Preparator may simply pass the same TrainingData as PreparedData for Algorithm.

prepare()

You need to implement the prepare() method of PPrepartor to perform such tasks.

< p>Example:

Algorithm

The two methods of the Algorithm class are train() and predict():

train()

train() is responsible for training a predictive model. It is called when you run pio train. Apache PredictionIO will store this model.

predict()

predict() is responsible for using this model to make prediction. It is called when you send a JSON query to the engine. Note that predict() is called in real time.

Apache PredictionIO supports two types of algor ithms:

P2LAlgorithm

For P2LAlgorithm, the Model is automatically serialized and persisted by Apache PredictionIO after training.

Implementing IPersistentModel and IPersistentModelLoader is optional for P2LAlgorithm.

Example:

PAlgorithm

PAlgorithm should be used when your Model contains RDD. The model produced by PAlgorithm is not persisted by default. To persist the model, you need to do the following:

  • The Model class should extend the IPersistentModel trait and implement the save() method for saving the model. The trait IPersistentModel requires a type parameter which is the class type of algorithm parameter.
  • Implement a Model factory object which extends the IPersistentModelLoader trait and implement the apply() for loading the model. The trait IPersistentModelLoader requires two type parameters which are the types of algorithm parameter and the model produced by the algorithm.

Example:

using LEventStore Engine API in predict()

You may use LEventStore.findByEntity() to retrieve events of a specific entity. For example, retrieve recent events of the user specified in the query) and use these recent events to make prediction in real time.

For example, the following code reads the recent 10 view events of query.user:

1
 2
 3
 4
@@ -211,7 +211,7 @@
         logger.error(s"Error when read recent events: ${e}")
         throw e
     }
-

Example:

Serving

serve()

You need to implement the serve() method of the class LServing. The serve() method processes predicted result. It is also responsible for combining multiple predicted results into one if you have more than one predictive model.

Example:

PredictionIO on Twitter Predic
 tionIO on Facebook