predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tihomir Lolić <tihomir.lo...@gmail.com>
Subject Re: Dynamically change parameter list
Date Thu, 15 Feb 2018 11:32:31 GMT
Hi Pat,

just wanted to follow up on this. I've modified CoreWorkflow to be able to
store alogrithmParams in the engineInstance.

val engineInstances = Storage.getMetaDataEngineInstances
engineInstances.update(engineInstance.copy(
  status = "COMPLETED",
  endTime = DateTime.now,
  algorithmsParams =
if(models(0).isInstanceOf[CustomCrossValidatorModel])
JsonExtractor.paramsToJson(workflowConfig.jsonExtractor,
algorithmParamsList) else engineInstance.algorithmsParams
))

Because I am using CrossValidator I had to extend it with one additional
parameter which I wanted to save during train of the model.

I don't need this saved data during retraining but only during prediction.
In case I need them during retraining I would modify TrainApp in a way to
fetch the data before starting the train and this would solve the problem
in case of reinforcement.

Hope this would help someone who needs such scenarios.

Best,
Tihomir


On Tue, Feb 13, 2018 at 12:35 AM, Pat Ferrel <pat@occamsmachete.com> wrote:

> That would be fine since the model can contain anything. But the real
> question is where you want to use those params. If you need to use them the
> next time you train, you’ll have to persist them to a place read during
> training. That is usually only the metadata store (obviously input events
> too), which has the contents of engine.json. So to get them into the
> metadata store you may have to alter engine.json.
>
> Unless someone else knows how to alter the metadata directly after `pio
> train`
>
> One problem is that you will never know what the new params are without
> putting them in a file or logging them. We keep them in a separate place
> and merge them with engine.json explicitly so we can see what is happening.
> They are calculated parameters, not hand made tunings. It seems important
> to me to keep those separate unless you are talking about some type of
> expected reinforcement learning, not really params but an evolving model.
>
>
> On Feb 12, 2018, at 2:48 PM, Tihomir Lolić <tihomir.lolic@gmail.com>
> wrote:
>
> Thank you very much for the answer. I'll try with customizing workflow.
> There is a step where Seq of models is returned. My idea is to return model
> and model parameters in this step. I'll let you know if it works.
>
> Thanks,
> Tihomie
>
> On Feb 12, 2018 23:34, "Pat Ferrel" <pat@occamsmachete.com> wrote:
>
>> This is an interesting question. As we make more mature full featured
>> engines they will begin to employ hyper parameter search techniques or
>> reinforcement params. This means that there is a new stage in the workflow
>> or a feedback loop not already accounted for.
>>
>> Short answer is no, unless you want to re-write your engine.json after
>> every train and probably keep the old one for safety. You must re-train to
>> get the new params put into the metastore and therefor available to your
>> engine.
>>
>> What we do for the Universal Recommender is have a special new workflow
>> phase, call it a self-tuning phase, where we search for the right tuning of
>> parameters. This it done with code that runs outside of pio and creates
>> parameters that go into the engine.json. This can be done periodically to
>> make sure the tuning is still optimal.
>>
>> Not sure whether feedback or hyper parameter search is the best
>> architecture for you.
>>
>>
>> From: Tihomir Lolić <tihomir.lolic@gmail.com> <tihomir.lolic@gmail.com>
>> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Date: February 12, 2018 at 2:02:48 PM
>> To: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Subject:  Dynamically change parameter list
>>
>> Hi,
>>
>> I am trying to figure out how to dynamically update algorithm parameter
>> list. After the train is finished only model is updated. The reason why I
>> need this data to be updated is that I am creating data mapping based on
>> the training data. Is there a way to update this data after the train is
>> done?
>>
>> Here is the code that I am using. The variable that and should be updated
>> after the train is marked *bold red.*
>>
>> import io.prediction.controller.{EmptyParams, EngineParams}
>> import io.prediction.data.storage.EngineInstance
>> import io.prediction.workflow.CreateWorkflow.WorkflowConfig
>> import io.prediction.workflow._
>> import org.apache.spark.ml.linalg.SparseVector
>> import org.joda.time.DateTime
>> import org.json4s.JsonAST._
>>
>> import scala.collection.mutable
>>
>> object TrainApp extends App {
>>
>>   val envs = Map("FOO" -> "BAR")
>>
>>   val sparkEnv = Map("spark.master" -> "local")
>>
>>   val sparkConf = Map("spark.executor.extraClassPath" -> ".")
>>
>>   val engineFactoryName = "LogisticRegressionEngine"
>>
>>   val workflowConfig = WorkflowConfig(
>>     engineId = EngineConfig.engineId,
>>     engineVersion = EngineConfig.engineVersion,
>>     engineVariant = EngineConfig.engineVariantId,
>>     engineFactory = engineFactoryName
>>   )
>>
>>   val workflowParams = WorkflowParams(
>>     verbose = workflowConfig.verbosity,
>>     skipSanityCheck = workflowConfig.skipSanityCheck,
>>     stopAfterRead = workflowConfig.stopAfterRead,
>>     stopAfterPrepare = workflowConfig.stopAfterPrepare,
>>     sparkEnv = WorkflowParams().sparkEnv ++ sparkEnv
>>   )
>>
>>   WorkflowUtils.modifyLogging(workflowConfig.verbose)
>>
>>   val dataSourceParams = DataSourceParams(sys.env.get("APP_NAME").get)
>>   val preparatorParams = EmptyParams()
>>
>>   *val algorithmParamsList = Seq("Logistic" -> LogisticParams(columns =
>> Array[String](),*
>> *
>> dataMapping = Map[String, Map[String, SparseVector]]()))*
>>   val servingParams = EmptyParams()
>>
>>   val engineInstance = EngineInstance(
>>     id = "",
>>     status = "INIT",
>>     startTime = DateTime.now,
>>     endTime = DateTime.now,
>>     engineId = workflowConfig.engineId,
>>     engineVersion = workflowConfig.engineVersion,
>>     engineVariant = workflowConfig.engineVariant,
>>     engineFactory = workflowConfig.engineFactory,
>>     batch = workflowConfig.batch,
>>     env = envs,
>>     sparkConf = sparkConf,
>>     dataSourceParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
>> workflowConfig.engineParamsKey -> dataSourceParams),
>>     preparatorParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
>> workflowConfig.engineParamsKey -> preparatorParams),
>>     algorithmsParams = JsonExtractor.paramsToJson(workflowConfig.jsonExtractor,
>> algorithmParamsList),
>>     servingParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
>> workflowConfig.engineParamsKey -> servingParams)
>>   )
>>
>>   val (engineLanguage, engineFactory) = WorkflowUtils.getEngine(engineInstance.engineFactory,
>> getClass.getClassLoader)
>>
>>   val engine = engineFactory()
>>
>>   val engineParams = EngineParams(
>>     dataSourceParams = dataSourceParams,
>>     preparatorParams = preparatorParams,
>>     algorithmParamsList = algorithmParamsList,
>>     servingParams = servingParams
>>   )
>>
>>   val engineInstanceId = CreateServer.engineInstances.i
>> nsert(engineInstance)
>>
>>   CoreWorkflow.runTrain(
>>     env = envs,
>>     params = workflowParams,
>>     engine = engine,
>>     engineParams = engineParams,
>>     engineInstance = engineInstance.copy(id = engineInstanceId)
>>   )
>>
>>   CreateServer.actorSystem.shutdown()
>> }
>>
>>
>> Thank you,
>> Tihomir
>>
>>
>

Mime
View raw message