http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/dd18a676/datacollection/batchimport/index.html ---------------------------------------------------------------------- diff --git a/datacollection/batchimport/index.html b/datacollection/batchimport/index.html new file mode 100644 index 0000000..1888598 --- /dev/null +++ b/datacollection/batchimport/index.html @@ -0,0 +1,68 @@ +Importing Data in Batch

If you have a large amount of data to start with, performing batch import will be much faster than sending every event over an HTTP connection.

Preparing Input File

The import tool expect s its input to be a file stored either in the local filesystem or on HDFS. Each line of the file should be a JSON object string representing an event. For more information about the format of event JSON object, please refer to this page.

Shown below is an example that contains 5 events ready to be imported to the Event Server.

1
+2
+3
+4
+5
{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"0","eventTime":"2014-11-21T01:04:14.716Z"}
+{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"1","eventTime":"2014-11-21T01:04:14.722Z"}
+{"event":"rate","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"2","properties":{"rating":1.0},"eventTime":"2014-11-21T01:04:14.729Z"}
+{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"7","eventTime":"2014-11-21T01:04:14.735Z"}
+{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"8","eventTime":"2014-11-21T01:04:14.741Z"}
+

Please make sure your import file does not contain any empty lines. Empty lines will be treated as a null object and will return an error during import.

Use SDK to Prepare Batch Input File

Some of the Apache PredictionIO (incubating) SDKs also provides FileExporter client. You may use them to prepare the JSON file as described above. The FileExporter creates event in the same way as EventClient except that the events are written to a JSON file instead of being sent to EventSever. The written JSON file can then be used by batch import.

(coming soon)
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
import predictionio
+from datetime import datetime
+import pytz
+
+# Create a FileExporter and specify "my_events.json" as destination file
+exporter = predictionio.FileExporter(file_name="my_events.json")
+
+event_properties = {
+    "someProperty" : "value1",
+    "anotherProperty" : "value2",
+    }
+# write the events to a file
+event_response = exporter.create_event(
+    event="my_event",
+    entity_type="user",
+    entity_id="uid",
+    target_entity_type="item",
+    target_entity_id="iid",
+    properties=event_properties,
+    event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc))
+
+# ...
+
+# close the FileExporter when finish writing all events
+exporter.close()
+
+
(coming soon)
1
(coming soon)
+

Import Events from Input File

Importing events from a file can be done easily using the command line interface. Assuming that pio be in your search path, your App ID be 123, and the input file my_events.json be in your current working directory:

1
$ pio import --appid 123 --input my_events.json
+

After a brief while, the tool should return to the console without any error. Congratulations! You have successfully imported your events.

\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/dd18a676/datacollection/batchimport/index.html.gz ---------------------------------------------------------------------- diff --git a/datacollection/batchimport/index.html.gz b/datacollection/batchimport/index.html.gz new file mode 100644 index 0000000..883f0db Binary files /dev/null and b/datacollection/batchimport/index.html.gz differ http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/dd18a676/datacollection/channel/index.html ---------------------------------------------------------------------- diff --git a/datacollection/channel/index.html b/datacollection/channel/index.html new file mode 100644 index 0000000..2bcb7c9 --- /dev/null +++ b/datacollection/channel/index.html @@ -0,0 +1,123 @@ +Channel

Each App has a default channel (without name) which stores all incoming events. This "default" one is used when channel is not specified.

You may create additional Channels for the App. Creating multiple Channels is advanced usage. You don't need to create any in order to use Apache PredictionIO (incubating). The Channel is associated with one App only and must have unique name within the same App.

Creating multiple Channels allows you more easily to identify, manage and use specific event data if you may collect events from different multiple sources (eg. mobile, website, or third-party webhooks service) for the your application.

(More usage details coming soon...)

Create a new Channel

For example, to create a new channel "myChannel" for app "myApp", run following pio command:

1
pio app channel-new myApp myChannel
+

you should see something like the following outputs:

1
+2
+3
+4
+5
+6
+7
[INFO] [App$] Updated Channel meta-data.
+[INFO] [HBLEvents] The table predictionio_eventdata:events_5_2 doesn't exist yet. Creating now...
+[INFO] [App$] Initialized Event Store for the channel: myChannel.
+[INFO] [App$] Created new channel:
+[INFO] [App$]     Channel Name: myChannel
+[INFO] [App$]       Channel ID: 2
+[INFO] [App$]           App ID: 5
+

Now "myChannel" is created and ready for collecting data.

Collect data through Channel

The Event API support optional channel query parameter. This allows you to import and query events of the specified channel. When the channel parameter is not specified, the data is collected through the default channel.

URL: http://localhost:7070/events.json?accessKey=yourAccessKeyString&channel=yourChannelName

Query parameters:

Field Type Description
accessKey String The Access Key for your App
channel String The channel name (optional). Specify this to import data to this channel. NOTE: supported in PIO version >= 0.9.2 only. Channel must be created first.

For SDK usage, one EventClient should be responsible for collecting data of one specific channel. The channel name is specified when the EventClient object is instantiated.

For example, the following code import event to "YOUR_CHANNEL" of the corresponding App.

1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
$ curl -i -X POST http://localhost:7070/events.json?accessKey=YOUR_ACCESS_KEY&channel=YOUR_CHANNEL \
+-H "Content-Type: application/json" \
+-d '{
+  "event" : "my_event",
+  "entityType" : "user",
+  "entityId" : "uid",
+  "targetEntityType" : "item",
+  "targetEntityId" : "iid",
+  "properties" : {
+    "someProperty" : "value1",
+    "anotherProperty" : "value2"
+  },
+  "eventTime" : "2004-12-13T21:39:45.618Z"
+}'
+
(TODO: update me)
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
from predictionio import EventClient
+from datetime import datetime
+import pytz
+
+# Create a EventClient for "YOUR_CHANNEL"
+client = EventClient('YOUR_ACCESS_KEY', "http://localhost:7070",
+  channel='YOUR_CHANNEL') # default channel if not specified
+
+event_properties = {
+    "someProperty" : "value1",
+    "anotherProperty" : "value2",
+    }
+event_response = client.create_event(
+    event="my_event",
+    entity_type="user",
+    entity_id="uid",
+    target_entity_type="item",
+    target_entity_id="iid",
+    properties=event_properties,
+    event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc))
+
+
(TODO: update me)
1
(coming soon)
+

You can also follow the EventAPI debug receipts to query the events of specific channel by adding the channel query parameter in the URL.

Delete a Channel (including all imported data)

1
pio app channel-delete <app name> <channel name>
+

Delete the data-only of a Channel

1
pio app data-delete <app name> --channel <channel name>
+

Accessing Channel Data in Engine

To acccess channel data, simply specify the channel name when use the PEventStore or LEventStore API. Data is read from from the default channel if channelName is not specified.

For example, read data from default channel:

1
+2
+3
+4
+5
+6
+7
+8
+    val eventsRDD: RDD[Event] = PEventStore.find(
+      appName = dsp.appName,
+      entityType = Some("user"),
+      eventNames = Some(List("rate", "buy")), // read "rate" and "buy" event
+      // targetEntityType is optional field of an event.
+      targetEntityType = Some(Some("item")))(sc)
+
+

For examlpe, read data from the channel "CHANNEL_NAME"

1
+2
+3
+4
+5
+6
+7
+8
+9
+    val eventsRDD: RDD[Event] = PEventStore.find(
+      appName = dsp.appName,
+      channelName = Some("CHANNEL_NAME"), // ADDED
+      entityType = Some("user"),
+      eventNames = Some(List("rate", "buy")), // read "rate" and "buy" event
+      // targetEntityType is optional field of an event.
+      targetEntityType = Some(Some("item")))(sc)
+
+
\ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-predictionio-site/blob/dd18a676/datacollection/channel/index.html.gz ---------------------------------------------------------------------- diff --git a/datacollection/channel/index.html.gz b/datacollection/channel/index.html.gz new file mode 100644 index 0000000..b05148e Binary files /dev/null and b/datacollection/channel/index.html.gz differ