flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-2162) Implement adaptive learning rate strategies for SGD
Date Thu, 04 Jun 2015 11:47:38 GMT

     [ https://issues.apache.org/jira/browse/FLINK-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Till Rohrmann updated FLINK-2162:
---------------------------------
    Description: 
At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate
= initialLearningRate/sqrt(iterationNumber)}}, which makes the optimization algorithm sensitive
to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD
might become instable.

There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4],
SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms
which don't require so much hyperparameter tweaking. It might be worthwhile to investigate
these approaches.

It might also be interesting to look at the implementation of vowpal wabbit [6].

Resources:
[1] [http://imgur.com/a/Hqolp]
[2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html]
[3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf]
[4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf]
[5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html]
[6] [https://github.com/JohnLangford/vowpal_wabbit]

  was:
At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate
= initialLearningRate/sqrt(iterationNumber)}}, which makes the optimization algorithm sensitive
to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD
might become instable.

There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4],
SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms
which don't require so much hyperparameter tweaking. It might be worthwhile to investigate
these approaches.

Resources:
[1] [http://imgur.com/a/Hqolp]
[2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html]
[3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf]
[4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf]
[5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html]


> Implement adaptive learning rate strategies for SGD
> ---------------------------------------------------
>
>                 Key: FLINK-2162
>                 URL: https://issues.apache.org/jira/browse/FLINK-2162
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Priority: Minor
>              Labels: ML
>
> At the moment, the SGD implementation uses a simple adaptive learning rate strategy,
{{adaptedLearningRate = initialLearningRate/sqrt(iterationNumber)}}, which makes the optimization
algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen
wrongly, then the SGD might become instable.
> There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta
[4], SGD with momentum [5] others [2]. They promise to result in more stable optimization
algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to
investigate these approaches.
> It might also be interesting to look at the implementation of vowpal wabbit [6].
> Resources:
> [1] [http://imgur.com/a/Hqolp]
> [2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html]
> [3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf]
> [4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf]
> [5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html]
> [6] [https://github.com/JohnLangford/vowpal_wabbit]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message