predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Szeto <don...@apache.org>
Subject Re: How to access Spark Context in predict?
Date Tue, 27 Sep 2016 04:19:22 GMT
Hi Hasan,

Does your randomForestModel contain any RDD?

If so, implement your algorithm by extending PAlgorithm, have your model
extend PersistentModel, and implement PersistentModelLoader to save and
load your model. You will be able to perform RDD operations within
predict() by using the model's RDD.

If not, implement your algorithm by extending P2LAlgorithm, and see if
PredictionIO can automatically persist the model for you. The convention
assumes that a non-RDD model does not require Spark to perform any RDD
operations, so there will be no SparkContext access.

Are these conventions not fitting your use case? Feedbacks are always
welcome for improving PredictionIO.

Regards,
Donald


On Mon, Sep 26, 2016 at 9:05 AM, Hasan Can Saral <hasancansaral@gmail.com>
wrote:

> Hi Marcin,
>
> I did look at the definition of PersistentModel, and indeed replaced
> LocalFileSystemPersistentModel with PersistenModel. Thank you for this, I
> really appreciate your help.
>
> However, I am having quite hard time understanding how I can access sc
> object that is provided by PredictionIO to save and apply methods within
> predict method.
>
> class SomeModel(randomForestModel: RandomForestModel,dummyRDD: RDD) extends PersistentModel[SomeAlgorithmParams]
{
>
>   override def save(id: String, params: SomeAlgorithmParams, sc: SparkContext): Boolean
= {
>
> // Here I should save randomForestModel to a file, but how to?// Tried saveAsObjectFile
but no luck.
>
> true
>
>   }
> }
>
> object SomeModel extends PersistentModelLoader[SomeAlgorithmParams, FraudModel] {
>   override def apply(id: String, params: SomeAlgorithmParams, sc: Option[SparkContext]):
SomeModel = {
>
> // // Here should I load randomForestModel from file? How?
>     new SomeModel(randomForestModel)
>
>   }
> }
>
> So, my questions have become:
> 1- Can I save randomForestModel? If yes, how? If I cannot, I will have to
> return false and retrain upon deployment. How do I skip pio train in this
> case?
> 2- How do I load saved randomForestModel from file? If I cannot, will I
> remove object SomeModel extends PersistentModelLoader all together?
> 3- How do I access sc within predict? Do I save a dummy RDD, load it in
> apply, and say .context? In this case what happens to randomForestModel?
>
> I am really quite confused and could really appreciate some help/sample
> code if you have time.
> Thank you.
> Hasan
>
>
> On Mon, Sep 26, 2016 at 2:56 PM, Marcin Ziemiński <zieminm@gmail.com>
> wrote:
>
>> Hi Hasan,
>>
>> So I guess, there are two things here:
>> 1. You need SparkContext for predictions
>> 2. You also need to retrain you model during loading
>>
>> Please, look at the definition of PersistentModel and the comments
>> attached:
>>
>> trait PersistentModel[AP <: Params] {/** Save the model to some persistent storage.
>> *
>> * This method should return true if the model has been saved successfully so
>> * that PredictionIO knows that it can be restored later during deployment.
>> * This method should return false if the model cannot be saved (or should
>> * not be saved due to configuration) so that PredictionIO will re-train the
>> * model during deployment. All arguments of this method are provided by
>> * automatically by PredictionIO.
>> *
>> * @param id ID of the run that trained this model.
>> * @param params Algorithm parameters that were used to train this model.
>> * @param sc An Apache Spark context.
>> */def save(id: String, params: AP, sc: SparkContext): Boolean}
>>
>> In order to achieve the desired result you could simply use
>> PersistentModel instead of LocalFileSystemPersistentModel and return false
>> from save. Then during deployment your model will be retrained through your
>> Algorithm implementation. You shouldn't need to retrain your model in
>> implementations of PersistentModelLoader - this is rather for loading
>> models, that are already trained and stored somewhere.
>> You can save SparkContext instance provided to the train method for usage
>> in predict(...) (assuming that your algorithm is an instance of PAlgorithm
>> or P2LAlgorithm). Thus you should have what you need.
>>
>> Regards,
>> Marcin
>>
>>
>>
>> pt., 23.09.2016 o 17:46 użytkownik Hasan Can Saral <
>> hasancansaral@gmail.com> napisał:
>>
>>> Hi Marcin!
>>>
>>> Thank you for your answer.
>>>
>>> I do only need SparkContext, but have no idea on:
>>> 1- How to retrieve it from PersitentModelLoader?
>>> 2- How do I access sc in predict method using the configuration below?
>>>
>>> class SomeModel() extends LocalFileSystemPersistentModel[SomeAlgorithmParams]
{
>>>   override def save(id: String, params: SomeAlgorithmParams, sc: SparkContext):
Boolean = {
>>>     false
>>>   }
>>> }
>>>
>>> object SomeModel extends LocalFileSystemPersistentModelLoader[SomeAlgorithmParams,
FraudModel] {
>>>   override def apply(id: String, params: SomeAlgorithmParams, sc: Option[SparkContext]):
SomeModel = {
>>>     new SomeModel() // HERE I TRAIN AND RETURN THE TRAINED MODEL
>>>   }
>>> }
>>>
>>> Thank you very much, I really appreciate it!
>>>
>>> Hasan
>>>
>>>
>>> On Thu, Sep 22, 2016 at 7:05 PM, Marcin Ziemiński <zieminm@gmail.com>
>>> wrote:
>>>
>>>> Hi Hasan,
>>>>
>>>> I think that you problem comes from using deserialized RDD, which
>>>> already lost its connection with SparkContext.
>>>> Similar case could be found here: http://stackoverflow.com/quest
>>>> ions/29567247/serializing-rdd
>>>>
>>>> If you only really need SparkContext you could probably use the one
>>>> provided to PersitentModelLoader, which would be implemented by your model.
>>>> Alternatively you could also implement PersistentModel to return false
>>>> from save method. In this case your algorithm would be retrained on deploy,
>>>> what would also provide you with the instance of SparkContext.
>>>>
>>>> Regards,
>>>> Marcin
>>>>
>>>>
>>>> czw., 22.09.2016 o 13:34 użytkownik Hasan Can Saral <
>>>> hasancansaral@gmail.com> napisał:
>>>>
>>>>> Hi!
>>>>>
>>>>> I am trying to query Event Server with PEventStore api in predict
>>>>> method to fetch events per entity to create my features. PEventStore
needs
>>>>> sc, and for this, I have:
>>>>>
>>>>> - Extended PAlgorithm
>>>>> - Extended LocalFileSystemPersistentModel and LocalFileSystemP
>>>>> ersistentModelLoader
>>>>> - Put a dummy emptyRDD into my model
>>>>> - Tried to access sc with model.dummyRDD.context to receive this
>>>>> error:
>>>>>
>>>>> org.apache.spark.SparkException: RDD transformations and actions can
>>>>> only be invoked by the driver, not inside of other transformations; for
>>>>> example, rdd1.map(x => rdd2.values.count() * x) is invalid because
the
>>>>> values transformation and count action cannot be performed inside of
the
>>>>> rdd1.map transformation. For more information, see SPARK-5063.
>>>>>
>>>>> Just like this user got it here
>>>>> <https://groups.google.com/forum/#!topic/predictionio-user/h4kIltGIIYE>
in
>>>>> predictionio-user group. Any suggestions?
>>>>>
>>>>> Here's a more of my predict method:
>>>>>
>>>>> def predict(model: SomeModel, query: Query): PredictedResult = {
>>>>>
>>>>>   def predict(model: SomeModel, query: Query): PredictedResult = {
>>>>>
>>>>>
>>>>>   val appName = sys.env.getOrElse[String]("APP_NAME", ap.appName)
>>>>>
>>>>>       var previousEvents = try {
>>>>>         PEventStore.find(
>>>>>           appName = appName,
>>>>>           entityType = Some(ap.entityType),
>>>>>           entityId = Some(query.entityId.getOrElse(""))
>>>>>         )(model.dummyRDD.context).map(event => {
>>>>>
>>>>>           Try(new CustomEvent(
>>>>>             Some(event.event),
>>>>>             Some(event.entityType),
>>>>>             Some(event.entityId),
>>>>>             Some(event.eventTime),
>>>>>             Some(event.creationTime),
>>>>>             Some(new Properties(
>>>>>               *...*
>>>>>             ))
>>>>>           ))
>>>>>         }).filter(_.isSuccess).map(_.get)
>>>>>       } catch {
>>>>>         case e: Exception => // fatal because of error, an empty query
>>>>>           logger.error(s"Error when reading events: ${e}")
>>>>>           throw e
>>>>>       }
>>>>>
>>>>>      ...
>>>>>
>>>>> }
>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> Hasan Can Saral
>>> hasancansaral@gmail.com
>>>
>>
>
>
> --
>
> Hasan Can Saral
> hasancansaral@gmail.com
>

Mime
View raw message