predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Dynamically change parameter list
Date Thu, 15 Feb 2018 17:49:06 GMT
There are several things to consider here. One is that the next time you train the metadata
will be re-written from engine.json. This used to happen when you `pio build` but i think
it was moved to train. In any case if you don’t need it as input to training it should be
a part of the model, right? The model is read during the predict phase and always re-written
by train.

BTW I don’t use the PIO cross-validation stuff because it is too restrictive for something
that may be used for hyper-parameter search. I have external python that drives PIO and collects
cross-validation results iteratively.  

On Feb 15, 2018, at 3:32 AM, Tihomir Lolić <> wrote:

Hi Pat,

just wanted to follow up on this. I've modified CoreWorkflow to be able to store alogrithmParams
in the engineInstance.

val engineInstances = Storage.getMetaDataEngineInstances
  status = "COMPLETED",
  endTime =,
  algorithmsParams = if(models(0).isInstanceOf[CustomCrossValidatorModel]) JsonExtractor.paramsToJson(workflowConfig.jsonExtractor,
algorithmParamsList) else engineInstance.algorithmsParams
Because I am using CrossValidator I had to extend it with one additional parameter which I
wanted to save during train of the model.

I don't need this saved data during retraining but only during prediction. In case I need
them during retraining I would modify TrainApp in a way to fetch the data before starting
the train and this would solve the problem in case of reinforcement.

Hope this would help someone who needs such scenarios.


On Tue, Feb 13, 2018 at 12:35 AM, Pat Ferrel < <>>
That would be fine since the model can contain anything. But the real question is where you
want to use those params. If you need to use them the next time you train, you’ll have to
persist them to a place read during training. That is usually only the metadata store (obviously
input events too), which has the contents of engine.json. So to get them into the metadata
store you may have to alter engine.json. 

Unless someone else knows how to alter the metadata directly after `pio train`

One problem is that you will never know what the new params are without putting them in a
file or logging them. We keep them in a separate place and merge them with engine.json explicitly
so we can see what is happening. They are calculated parameters, not hand made tunings. It
seems important to me to keep those separate unless you are talking about some type of expected
reinforcement learning, not really params but an evolving model.

On Feb 12, 2018, at 2:48 PM, Tihomir Lolić < <>>

Thank you very much for the answer. I'll try with customizing workflow. There is a step where
Seq of models is returned. My idea is to return model and model parameters in this step. I'll
let you know if it works.


On Feb 12, 2018 23:34, "Pat Ferrel" < <>>
This is an interesting question. As we make more mature full featured engines they will begin
to employ hyper parameter search techniques or reinforcement params. This means that there
is a new stage in the workflow or a feedback loop not already accounted for.

Short answer is no, unless you want to re-write your engine.json after every train and probably
keep the old one for safety. You must re-train to get the new params put into the metastore
and therefor available to your engine.

What we do for the Universal Recommender is have a special new workflow phase, call it a self-tuning
phase, where we search for the right tuning of parameters. This it done with code that runs
outside of pio and creates parameters that go into the engine.json. This can be done periodically
to make sure the tuning is still optimal.

Not sure whether feedback or hyper parameter search is the best architecture for you.

From: Tihomir Lolić <> <>
Reply: <> <>
Date: February 12, 2018 at 2:02:48 PM
To: <> <>
Subject:  Dynamically change parameter list 

> Hi,
> I am trying to figure out how to dynamically update algorithm parameter list. After the
train is finished only model is updated. The reason why I need this data to be updated is
that I am creating data mapping based on the training data. Is there a way to update this
data after the train is done?
> Here is the code that I am using. The variable that and should be updated after the train
is marked bold red.
> import io.prediction.controller.{EmptyParams, EngineParams}
> import
> import io.prediction.workflow.CreateWorkflow.WorkflowConfig
> import io.prediction.workflow._
> import
> import org.joda.time.DateTime
> import org.json4s.JsonAST._
> import scala.collection.mutable
> object TrainApp extends App {
>   val envs = Map("FOO" -> "BAR")
>   val sparkEnv = Map("spark.master" -> "local")
>   val sparkConf = Map("spark.executor.extraClassPath" -> ".")
>   val engineFactoryName = "LogisticRegressionEngine"
>   val workflowConfig = WorkflowConfig(
>     engineId = EngineConfig.engineId,
>     engineVersion = EngineConfig.engineVersion,
>     engineVariant = EngineConfig.engineVariantId,
>     engineFactory = engineFactoryName
>   )
>   val workflowParams = WorkflowParams(
>     verbose = workflowConfig.verbosity,
>     skipSanityCheck = workflowConfig.skipSanityCheck,
>     stopAfterRead = workflowConfig.stopAfterRead,
>     stopAfterPrepare = workflowConfig.stopAfterPrepare,
>     sparkEnv = WorkflowParams().sparkEnv ++ sparkEnv
>   )
>   WorkflowUtils.modifyLogging(workflowConfig.verbose)
>   val dataSourceParams = DataSourceParams(sys.env.get("APP_NAME").get)
>   val preparatorParams = EmptyParams()
>   val algorithmParamsList = Seq("Logistic" -> LogisticParams(columns = Array[String](),
>                                                               dataMapping = Map[String,
Map[String, SparseVector]]()))
>   val servingParams = EmptyParams()
>   val engineInstance = EngineInstance(
>     id = "",
>     status = "INIT",
>     startTime =,
>     endTime =,
>     engineId = workflowConfig.engineId,
>     engineVersion = workflowConfig.engineVersion,
>     engineVariant = workflowConfig.engineVariant,
>     engineFactory = workflowConfig.engineFactory,
>     batch = workflowConfig.batch,
>     env = envs,
>     sparkConf = sparkConf,
>     dataSourceParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor, workflowConfig.engineParamsKey
-> dataSourceParams),
>     preparatorParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor, workflowConfig.engineParamsKey
-> preparatorParams),
>     algorithmsParams = JsonExtractor.paramsToJson(workflowConfig.jsonExtractor, algorithmParamsList),
>     servingParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor, workflowConfig.engineParamsKey
-> servingParams)
>   )
>   val (engineLanguage, engineFactory) = WorkflowUtils.getEngine(engineInstance.engineFactory,
>   val engine = engineFactory()
>   val engineParams = EngineParams(
>     dataSourceParams = dataSourceParams,
>     preparatorParams = preparatorParams,
>     algorithmParamsList = algorithmParamsList,
>     servingParams = servingParams
>   )
>   val engineInstanceId = CreateServer.engineInstances.insert(engineInstance)
>   CoreWorkflow.runTrain(
>     env = envs,
>     params = workflowParams,
>     engine = engine,
>     engineParams = engineParams,
>     engineInstance = engineInstance.copy(id = engineInstanceId)
>   )
>   CreateServer.actorSystem.shutdown()
> }
> Thank you,
> Tihomir

View raw message