predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hasan Can Saral <hasancansa...@gmail.com>
Subject Re: How to access Spark Context in predict?
Date Mon, 26 Sep 2016 16:05:42 GMT
Hi Marcin,

I did look at the definition of PersistentModel, and indeed replaced
LocalFileSystemPersistentModel with PersistenModel. Thank you for this, I
really appreciate your help.

However, I am having quite hard time understanding how I can access sc
object that is provided by PredictionIO to save and apply methods within
predict method.

class SomeModel(randomForestModel: RandomForestModel,dummyRDD: RDD)
extends PersistentModel[SomeAlgorithmParams] {

  override def save(id: String, params: SomeAlgorithmParams, sc:
SparkContext): Boolean = {

// Here I should save randomForestModel to a file, but how to?// Tried
saveAsObjectFile but no luck.

true

  }
}

object SomeModel extends PersistentModelLoader[SomeAlgorithmParams,
FraudModel] {
  override def apply(id: String, params: SomeAlgorithmParams, sc:
Option[SparkContext]): SomeModel = {

// // Here should I load randomForestModel from file? How?
    new SomeModel(randomForestModel)

  }
}

So, my questions have become:
1- Can I save randomForestModel? If yes, how? If I cannot, I will have to
return false and retrain upon deployment. How do I skip pio train in this
case?
2- How do I load saved randomForestModel from file? If I cannot, will I
remove object SomeModel extends PersistentModelLoader all together?
3- How do I access sc within predict? Do I save a dummy RDD, load it in
apply, and say .context? In this case what happens to randomForestModel?

I am really quite confused and could really appreciate some help/sample
code if you have time.
Thank you.
Hasan


On Mon, Sep 26, 2016 at 2:56 PM, Marcin Ziemiński <zieminm@gmail.com> wrote:

> Hi Hasan,
>
> So I guess, there are two things here:
> 1. You need SparkContext for predictions
> 2. You also need to retrain you model during loading
>
> Please, look at the definition of PersistentModel and the comments
> attached:
>
> trait PersistentModel[AP <: Params] {/** Save the model to some persistent storage.
> *
> * This method should return true if the model has been saved successfully so
> * that PredictionIO knows that it can be restored later during deployment.
> * This method should return false if the model cannot be saved (or should
> * not be saved due to configuration) so that PredictionIO will re-train the
> * model during deployment. All arguments of this method are provided by
> * automatically by PredictionIO.
> *
> * @param id ID of the run that trained this model.
> * @param params Algorithm parameters that were used to train this model.
> * @param sc An Apache Spark context.
> */def save(id: String, params: AP, sc: SparkContext): Boolean}
>
> In order to achieve the desired result you could simply use
> PersistentModel instead of LocalFileSystemPersistentModel and return false
> from save. Then during deployment your model will be retrained through your
> Algorithm implementation. You shouldn't need to retrain your model in
> implementations of PersistentModelLoader - this is rather for loading
> models, that are already trained and stored somewhere.
> You can save SparkContext instance provided to the train method for usage
> in predict(...) (assuming that your algorithm is an instance of PAlgorithm
> or P2LAlgorithm). Thus you should have what you need.
>
> Regards,
> Marcin
>
>
>
> pt., 23.09.2016 o 17:46 użytkownik Hasan Can Saral <
> hasancansaral@gmail.com> napisał:
>
>> Hi Marcin!
>>
>> Thank you for your answer.
>>
>> I do only need SparkContext, but have no idea on:
>> 1- How to retrieve it from PersitentModelLoader?
>> 2- How do I access sc in predict method using the configuration below?
>>
>> class SomeModel() extends LocalFileSystemPersistentModel[SomeAlgorithmParams] {
>>   override def save(id: String, params: SomeAlgorithmParams, sc: SparkContext): Boolean
= {
>>     false
>>   }
>> }
>>
>> object SomeModel extends LocalFileSystemPersistentModelLoader[SomeAlgorithmParams,
FraudModel] {
>>   override def apply(id: String, params: SomeAlgorithmParams, sc: Option[SparkContext]):
SomeModel = {
>>     new SomeModel() // HERE I TRAIN AND RETURN THE TRAINED MODEL
>>   }
>> }
>>
>> Thank you very much, I really appreciate it!
>>
>> Hasan
>>
>>
>> On Thu, Sep 22, 2016 at 7:05 PM, Marcin Ziemiński <zieminm@gmail.com>
>> wrote:
>>
>>> Hi Hasan,
>>>
>>> I think that you problem comes from using deserialized RDD, which
>>> already lost its connection with SparkContext.
>>> Similar case could be found here: http://stackoverflow.com/
>>> questions/29567247/serializing-rdd
>>>
>>> If you only really need SparkContext you could probably use the one
>>> provided to PersitentModelLoader, which would be implemented by your model.
>>> Alternatively you could also implement PersistentModel to return false
>>> from save method. In this case your algorithm would be retrained on deploy,
>>> what would also provide you with the instance of SparkContext.
>>>
>>> Regards,
>>> Marcin
>>>
>>>
>>> czw., 22.09.2016 o 13:34 użytkownik Hasan Can Saral <
>>> hasancansaral@gmail.com> napisał:
>>>
>>>> Hi!
>>>>
>>>> I am trying to query Event Server with PEventStore api in predict
>>>> method to fetch events per entity to create my features. PEventStore needs
>>>> sc, and for this, I have:
>>>>
>>>> - Extended PAlgorithm
>>>> - Extended LocalFileSystemPersistentModel and LocalFileSystemP
>>>> ersistentModelLoader
>>>> - Put a dummy emptyRDD into my model
>>>> - Tried to access sc with model.dummyRDD.context to receive this
>>>> error:
>>>>
>>>> org.apache.spark.SparkException: RDD transformations and actions can
>>>> only be invoked by the driver, not inside of other transformations; for
>>>> example, rdd1.map(x => rdd2.values.count() * x) is invalid because the
>>>> values transformation and count action cannot be performed inside of the
>>>> rdd1.map transformation. For more information, see SPARK-5063.
>>>>
>>>> Just like this user got it here
>>>> <https://groups.google.com/forum/#!topic/predictionio-user/h4kIltGIIYE>
in
>>>> predictionio-user group. Any suggestions?
>>>>
>>>> Here's a more of my predict method:
>>>>
>>>> def predict(model: SomeModel, query: Query): PredictedResult = {
>>>>
>>>>   def predict(model: SomeModel, query: Query): PredictedResult = {
>>>>
>>>>
>>>>   val appName = sys.env.getOrElse[String]("APP_NAME", ap.appName)
>>>>
>>>>       var previousEvents = try {
>>>>         PEventStore.find(
>>>>           appName = appName,
>>>>           entityType = Some(ap.entityType),
>>>>           entityId = Some(query.entityId.getOrElse(""))
>>>>         )(model.dummyRDD.context).map(event => {
>>>>
>>>>           Try(new CustomEvent(
>>>>             Some(event.event),
>>>>             Some(event.entityType),
>>>>             Some(event.entityId),
>>>>             Some(event.eventTime),
>>>>             Some(event.creationTime),
>>>>             Some(new Properties(
>>>>               *...*
>>>>             ))
>>>>           ))
>>>>         }).filter(_.isSuccess).map(_.get)
>>>>       } catch {
>>>>         case e: Exception => // fatal because of error, an empty query
>>>>           logger.error(s"Error when reading events: ${e}")
>>>>           throw e
>>>>       }
>>>>
>>>>      ...
>>>>
>>>> }
>>>>
>>>>
>>
>>
>> --
>>
>> Hasan Can Saral
>> hasancansaral@gmail.com
>>
>


-- 

Hasan Can Saral
hasancansaral@gmail.com

Mime
View raw message