predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mars Hall <m...@heroku.com>
Subject Re: Saving predictions on training data with unsupervised learning
Date Sun, 05 Mar 2017 17:52:41 GMT
On Sat, Mar 4, 2017 at 20:10 Kenneth Chan <kenneth@apache.org <mailto:kenneth@apache.org>>
wrote:
> I guess your use case is not for real time label classify for unseen data?


Yes, in addition to realtime classification of unseen data with `POST /queries.json`, we also
seek insight into the clusters found in training the model.

> batch prediction is basically the same as batch eval.
> see if this example helps?
> 
> http://predictionio.incubator.apache.org/templates/recommendation/batch-evaluator/ <http://predictionio.incubator.apache.org/templates/recommendation/batch-evaluator/>
The batch evaluator seems like it will provide the right combination of features to solve
our problem using `pio eval` CLI command.

Can a batch evaluator be invoked programmatically within or after model#save? (Sorry if that's
a stupid question, I'm still new to Scala & the PIO source.) Looks like  `EvaluationWorkflow.runEvaluation(…)`
might work: https://github.com/apache/incubator-predictionio/blob/d45d0a4170ef73695e0fffe921e5bda2c37e2281/core/src/test/scala/org/apache/predictionio/workflow/EvaluationWorkflowTest.scala#L41

Thanks for your guidance Kenneth,

*Mars

( <> .. <> )

On Sat, Mar 4, 2017 at 11:56 AM Mars Hall <mars@heroku.com <mailto:mars@heroku.com>>
wrote:
Hi 🐸 folks,

When using unsupervised learning algorithms (like K-Means) we need to save the predicted labels
(cluster IDs) for the training data back into the datastore. Ideally, we want to automatically
save bulk predictions for the training data after the model is created, when the RDD/DataFrame
of all that data is already resident in Spark memory. It seems complex & inefficient to
develop a whole separate process that (re)selects all that training data and then iteratively
POSTs to `/queries.json` to get every prediction…

Would adding a `bulk_save_predictions()` function to the persistent model's #save method might
be the right place to save predictions back into the eventdata store?

How do you folks label the training data from an unsupervised algorithm?

Any suggestions for making bulk predictions that mesh with PredictionIO's workflow?

*Mars Hall
Customer Facing Architect
Salesforce App Cloud / Heroku
San Francisco, California




Mime
View raw message