predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mars Hall <mars.h...@salesforce.com>
Subject Re: SparkContext in predict()
Date Wed, 13 Sep 2017 15:28:16 GMT
Hi Daniel,

# Glad you figured it out! I already had this draft 90% composed so figured
I'd send it anyway:

In order to use SparkContext in predict, the algorithm needs to inherit
from PAlgorithm instead of P2LAlgorthm.

Implementing PAlgorithm requires a bit more code, because it requires
manual serialization and deserialization of the model. SparkContext itself
is not serializable, so in Model#save need to saveObjectToFile the RDDs (or
equivalent) to filesystem/S3, and then in Model.apply create a new
SparkContext and read those saved RDDs.

On Wed, Sep 13, 2017 at 6:20 AM, Daniel O' Shaughnessy <
danieljamesdavid@gmail.com> wrote:

> Hi,
>
> I'm trying to make use of the raw probabilities column in the
> RandomForestClassifierModel from spark (mllib's RandomForestModel doesn't
> have this feature)
>
> import org.apache.spark.ml.classification.{RandomForestClassificationModel,
> RandomForestClassifier}
>
> For this, I need to convert the query to a DataFrame/DataSet Row by
> accessing the sqlContext (built from SparkContext) in the predict method
> but this doesn't seem to be allowed AFAIK.
>
> Is there a way I can save the SparkContext in the model and pull it out
> from the model when needed?
>
> Thanks.
>
>
>


-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California
-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Mime
View raw message