Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E2200200D26 for ; Fri, 6 Oct 2017 07:30:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E07D8160BDB; Fri, 6 Oct 2017 05:30:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 89AC8160BDE for ; Fri, 6 Oct 2017 07:30:10 +0200 (CEST) Received: (qmail 49368 invoked by uid 500); 6 Oct 2017 05:30:09 -0000 Mailing-List: contact commits-help@predictionio.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@predictionio.incubator.apache.org Delivered-To: mailing list commits@predictionio.incubator.apache.org Received: (qmail 49304 invoked by uid 99); 6 Oct 2017 05:30:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Oct 2017 05:30:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 646E01A185D for ; Fri, 6 Oct 2017 05:30:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.221 X-Spam-Level: X-Spam-Status: No, score=-4.221 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_SHORT=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id r-_9E8AqgDW7 for ; Fri, 6 Oct 2017 05:29:33 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id E942E5F613 for ; Fri, 6 Oct 2017 05:29:19 +0000 (UTC) Received: (qmail 43413 invoked by uid 99); 6 Oct 2017 05:29:17 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Oct 2017 05:29:17 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 67B36F5CE2; Fri, 6 Oct 2017 05:29:16 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: git-site-role@apache.org To: commits@predictionio.incubator.apache.org Date: Fri, 06 Oct 2017 05:30:02 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [49/51] [abbrv] [partial] incubator-predictionio-site git commit: Documentation based on apache/incubator-predictionio#d8ee0c8ffdd27d3f2bbe9560b229bc36ee966f9d archived-at: Fri, 06 Oct 2017 05:30:13 -0000 http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/3897c890/batchpredict/index.html ---------------------------------------------------------------------- diff --git a/batchpredict/index.html b/batchpredict/index.html new file mode 100644 index 0000000..9690064 --- /dev/null +++ b/batchpredict/index.html @@ -0,0 +1,29 @@ +Batch Predictions

Overview

Process predictions for many queries using efficient parallelization through Spark. Useful for mass auditing of predictions and for generating predictions to push into other systems.

Batch predict reads and writes multi-object JSON files similar to the batch import format. JSON objects are separated by newlines and cannot themselves contain unencoded newlines.

Compatibility

pio batchpredict loads the engine and processes queries exactly like pio deploy. There is only one additional requirement for engines to utilize batch predict:

All algorithm classes used in the engine must be serializable. This is already true for PredictionIO's base algorithm classes, but may be broken by including non-serializable fields in their constructor. Using the @transient annotatio n may help in these cases.

This requirement is due to processing the input queries as a Spark RDD which enables high-performance parallelization, even on a single machine.

Usage

pio batchpredict

Command to process bulk predictions. Takes the same options as pio deploy plus:

--input <value>

Path to file containing queries; a multi-object JSON file with one query object per line. Accepts any valid Hadoop file URL.

Default: batchpredict-input.json

--output <value>

Path to file to receive results; a multi-objec t JSON file with one object per line, the prediction + original query. Accepts any valid Hadoop file URL. Actual output will be written as Hadoop partition files in a directory with the output name.

Default: batchpredict-output.json

--query-partitions <value>

Configure the concurrency of predictions by setting the number of partitions used internally for the RDD of queries. This will directly effect the number of resulting part-* output files. While setting to 1 may seem appealing to get a single output file, this will remove parallelization for the batch process, reducing performance and possibly exhausting memory.

Default: number created by Spark context's textFile (probably the number of cores available on the local machine)

--engine-instance-id <value>

Identifier for the trained instance to use for batch predict.

Default: the latest trained instance.

Example

Input

A multi-object JSON file of queries as they would be sent to the engine's HTTP Queries API.

Read via SparkContext's textFile and so may be a single file or any supported Hadoop format.

File: batchpredict-input.json

1
+2
+3
+4
+5
{"user":"1"}
+{"user":"2"}
+{"user":"3"}
+{"user":"4"}
+{"user":"5"}
+

Execute

1
+2
+3
pio batchpredict \
+  --input batchpredict-input.json \
+  --output batchpredict-output.json
+

This command will run to completion, aborting if any errors are encountered.

Output

A multi-object JSON file of predictions + original queries. The predictions are JSON objects as they would be returned from the engine's HTTP Queries API.

Results are written via Spark RDD's saveAsTextFile so each partition will be written to its own part-* file. See post-processing results.

File 1: batchpredict-output.json/part-00000

1
+2
+3
{"query":{"user":"1"},"prediction":{"itemScores":[{"item":"1","score":33},{"item":"2","score":32}]}}
+{"query":{"user":"3"},"prediction":{"itemScores":[{"item":"2","score":16},{"item":"3","score":12}]}}
+{"query":{"user":"4"},"prediction":{"itemScores":[{"item":"3","score":19},{"item":"1","score":18}]}}
+

File 2: batchpredict-output.json/part-00001

1
+2
{"query":{"user":"2"},"prediction":{"itemScores":[{"item":"5","score":55},{"item":"3","score":28}]}}
+{"query":{"user":"5"},"prediction":{"itemScores":[{"item":"1","score":24},{"item":"4","score":14}]}}
+

Post-processing Results

After the process exits successfully, the parts may be concatenated into a single output file using a command like:

1
cat batchpredict-output.json/part-* > batchpredict-output-all.json
+
\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/3897c890/batchpredict/index.html.gz ---------------------------------------------------------------------- diff --git a/batchpredict/index.html.gz b/batchpredict/index.html.gz new file mode 100644 index 0000000..0bf6dac Binary files /dev/null and b/batchpredict/index.html.gz differ http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/3897c890/cli/index.html ---------------------------------------------------------------------- diff --git a/cli/index.html b/cli/index.html new file mode 100644 index 0000000..a3306c2 --- /dev/null +++ b/cli/index.html @@ -0,0 +1,6 @@ +Command Line

Overview

Interaction with Apache PredictionIO (incubating) is done through the command line interface. It follows the format of:

pio <command> [options] <args>...

You can run pio help to see a list of all available commands and pio help <command> to see details of the command.

Apache PredictionIO (incubating) commands can be separated into the following three categories.

General Commands

pio help Display usage summary. pio help <command> to read about a specific subcommand.

pio version Displays the version of the installed PredictionIO.

pio status Displays install path and running status of PredictionIO system and its dependencies.

Event Server Commands

pio eventserver Launch the Event Server.

pio app Manage apps that are used by the Event Server.

pio app data-delete <name> deletes all data associated with the app.

pio app delete <name> deletes the app and its data.

--ip <value> IP to bind to. Default to localhost.

--port <value> Port to bind to. Default to 7070.

pio accesskey Manage app access keys.

Engine Commands

Engine commands need to be run from the directory that contains the engine project. --debug and --verbose flags will provide debug and third-party informational messages.

pio build Build the engine at the current directory.

pio train Kick off a training using an engine.

pio deploy Deploy an engine as an engine server.

pio batchpredict Process bulk predictions using an engine.

For deploy & batchpredict, if --engine-instance-id is not specified, it will use the latest tr ained instance.

\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/3897c890/cli/index.html.gz ---------------------------------------------------------------------- diff --git a/cli/index.html.gz b/cli/index.html.gz new file mode 100644 index 0000000..39e3e5d Binary files /dev/null and b/cli/index.html.gz differ http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/3897c890/commit_id.properties ---------------------------------------------------------------------- diff --git a/commit_id.properties b/commit_id.properties new file mode 100644 index 0000000..61a6650 --- /dev/null +++ b/commit_id.properties @@ -0,0 +1 @@ +d8ee0c8ffdd27d3f2bbe9560b229bc36ee966f9d http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/3897c890/community/contribute-code/index.html ---------------------------------------------------------------------- diff --git a/community/contribute-code/index.html b/community/contribute-code/index.html new file mode 100644 index 0000000..3ab4ede --- /dev/null +++ b/community/contribute-code/index.html @@ -0,0 +1,6 @@ +Contribute Code

Thank you for your interest in contributing to Apache PredictionIO (incubating). Our mission is to enable developers to build scalable machine learning applications easily. Here is how you can help with the project development. If you have any question regarding development at anytime, please free to subscribe and post to the Development Mailing List.

Areas in Need of Help

We accept contributions of all kinds at any time. We are compiling this list to show features that are highly sought after by the community.

  • Tests and CI
  • Engine template, tutorials, and samples
  • Client SDKs
  • Building engines in Java (updating the Java controller API)
  • Code clean up and refactoring
  • Code and data pipeline optimization
  • Developer experience (UX) improvement

How to Report an Issue

If you wish to report an issue you found, you can do so on Apache PredictionIO (incubating) JIRA.

How to Help Resolve Existing Issues

In general, bug fixes should be done the same way as new features, but critical bug fixes will follow a different path.

How to Add / Propose a New Feature

Before adding new features into JIRA, please check that the feature does not currently exist in JIRA.

  1. To propose a new feature, simply subscribe and post your proposal to Apache PredictionIO (incubating) Development Mailing List.
  2. Discuss with the community and the core development team on what needs to be done, and lay down concrete plans on deliverables.
  3. Once solid plans are made, start creating tickets in the issue tracker.
  4. Work side by side with other developers using Apache PredictionIO (incubating) Development Mailing List as primary mode of communication. You never know if someone else has a better idea. ;)

Adding ticket to JIRA

  1. Add a descriptive Summary and a detailed description
  2. Set Issue Type to Bug, Improvement, New Feature, Test or Wish
  3. Set Priority to Blocker, Critical, Major, Minor or Trivial
  4. Fill out Affects Version with the version of PredictionIO you are currently using
  5. Fill out Environment if needed for description of your bug / feature
  6. Please leave other fields blank

Triaging JIRA

Tickets will be triaged by PredictionIO committers.

  • Target Version: Either a particular version or Future if to be done later

    • Once a fix has been committed, the Fix Version will filled in with the appropriate release
  • Component: Each ticket will be annotated with one or more of the following Components

    • Core: affects the main code branch / will be part of a release
    • Documentation: affects the documents / will be pushed to livedoc branch
    • Templates: affects one of the separate github repositories for a template

    How to Issue a Pull Request

    When you have finished your code, you can create a pull request against the develop branch.

    • The title must contain a tag associating with an existing JIRA ticket. You must create a ticket so that the infrastructure can correctly track issues across Apache JIRA and GitHub. If your ticket is PIO-789, your title must look something like [PIO-789] Some short description.
    • Please also, in your commit message summary, include the JIRA ticket number similar to above.
    • Make sure the title and description are clear and concise. For more details on writing a good commit message, check out this guide.
    • If the change is visual, make sure to include a scree nshot or GIF.
    • Make sure it is being opened into the right branch.
    • Make sure it has been rebased on top of that branch.

    When it is close to a release, and if there are major development ongoing, a release branch will be forked from the develop branch to stabilize the code for binary release. Please refer to the git flow methodology page for more information.

    Getting Started

    Apache PredictionIO (incubating) relies heavily on the git flow methodology. Please make sure you read and understand it before you start your development. By default, cloning Apache PredictionIO (incubating) will put you in the develop branch, which in most cases is where all the latest development go to.

    For core development, please follow the Scala Style Guide.

    Create a Fork of the Apache PredictionIO (incubating) Repository

    1. Start by creating a GitHub account if you do not already have one.
    2. Go to Apache PredictionIO (incubating)’s GitHub mirror and fork it to your own account.
    3. Clone your fork to your local machine.

    If you need additional help, please refer to https://help.github.com/articles/fork-a-repo/.

    Building Apache PredictionIO (incubating) from Source

    After the previous section, you should have a copy of Apache PredictionIO (incubating) in your local machine ready to be built.

    1. Make sure you are on the devel op branch. You can double check by git status or simply git checkout develop.
    2. At the root of the repository, do ./make-distribution.sh to build PredictionIO.

    Setting Up the Environment

    Apache PredictionIO (incubating) relies on 3rd party software to perform its tasks. To set them up, simply follow this documentation.

    Start Hacking

    You should have a Apache PredictionIO (incubating) development environment by now. Happy hacking!

    Anatomy of Apache PredictionIO (incubating) Code Tree

    The following describes each directory’s purpose.

    bin

    Shell scripts and any relevant components to go into the binary distribution. Utility shell scripts can also be included here.

    conf

    Configuration files that are used by both a source tree and binary distribution.

    core

    Core Apache PredictionIO (incubating) code that provides the DASE controller API, core data structures, and workflow creation and management code.

    data

    Apache PredictionIO (incubating) Event Server, and backend-agnostic storage layer for event store and metadata store.

    docs

    Source code for http://predictionio.incubator.apache.org site, and any other documentation support files.

    examples

    Complete code examples showing Apache PredictionIO (incubating)'s application.

    sbt

    Embedded SBT (Simple Build Tool) launcher.

    storage

    Storage implementations.

    tools

    Tools for running Apache PredictionIO (incubating). Contains primarily the CLI (command-line interface) and its supporting code, and the experimental evaluation dashboard.

\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/3897c890/community/contribute-code/index.html.gz ---------------------------------------------------------------------- diff --git a/community/contribute-code/index.html.gz b/community/contribute-code/index.html.gz new file mode 100644 index 0000000..c16ac9a Binary files /dev/null and b/community/contribute-code/index.html.gz differ