Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 854F919000 for ; Thu, 4 Jun 2015 11:47:38 +0000 (UTC) Received: (qmail 27938 invoked by uid 500); 4 Jun 2015 11:47:38 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 27898 invoked by uid 500); 4 Jun 2015 11:47:38 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 27889 invoked by uid 99); 4 Jun 2015 11:47:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2015 11:47:38 +0000 Date: Thu, 4 Jun 2015 11:47:38 +0000 (UTC) From: "Till Rohrmann (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (FLINK-2162) Implement adaptive learning rate strategies for SGD MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-2162: --------------------------------- Description: At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate = initialLearningRate/sqrt(iterationNumber)}}, which makes the optimization algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD might become instable. There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches. It might also be interesting to look at the implementation of vowpal wabbit [6]. Resources: [1] [http://imgur.com/a/Hqolp] [2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html] [3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf] [4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf] [5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html] [6] [https://github.com/JohnLangford/vowpal_wabbit] was: At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate = initialLearningRate/sqrt(iterationNumber)}}, which makes the optimization algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD might become instable. There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches. Resources: [1] [http://imgur.com/a/Hqolp] [2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html] [3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf] [4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf] [5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html] > Implement adaptive learning rate strategies for SGD > --------------------------------------------------- > > Key: FLINK-2162 > URL: https://issues.apache.org/jira/browse/FLINK-2162 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Reporter: Till Rohrmann > Priority: Minor > Labels: ML > > At the moment, the SGD implementation uses a simple adaptive learning rate strategy, {{adaptedLearningRate = initialLearningRate/sqrt(iterationNumber)}}, which makes the optimization algorithm sensitive to the setting of the {{initialLearningRate}}. If this value is chosen wrongly, then the SGD might become instable. > There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches. > It might also be interesting to look at the implementation of vowpal wabbit [6]. > Resources: > [1] [http://imgur.com/a/Hqolp] > [2] [http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html] > [3] [http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf] > [4] [http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf] > [5] [http://www.willamette.edu/~gorr/classes/cs449/momrate.html] > [6] [https://github.com/JohnLangford/vowpal_wabbit] -- This message was sent by Atlassian JIRA (v6.3.4#6332)