Hi Sasha, Niketan, and Mike, (sorry, if I missed out on someone)
So far we have encountered some problems and situations where we need some
more thinking. But, until then let us start a preliminary script, for
checking different scenarios with our existing top level algorithms and
deep learning algorithms.
Along with the previously proposed ones, we can try
1. The constraints (both constrained & unconstrained)
2. Convergence rate check, may be for settling at a prior (and our
convergence criteria, based upon Convergence Rates for Efficient Global
Optimization Algorithms: https://arxiv.org/pdf/1101.3501v3.pdf )
May be we could implement some priors, instead of one particularly.
I am planning to keep my schedule free for a month to only focus on this
implementation. Owing to its importance for the neural networks where we
need less memory consumption especially to fit into the GPUs, It would be
great if we could ship this with `1.0` release.
*Design document: *http://bit.do/systemmlbayesian
Thanks you very much,
Janardhan
On Wed, Aug 23, 2017 at 4:04 PM, Alexandre V Evfimievski <evfimi@us.ibm.com>
wrote:
> Hi Janardhan,
>
> The number of parameters could be rather large, that's certainly an issue
> for Bayesian Optimization. A perfect implementation would, perhaps, pick a
> sample of parameters and a sample of the dataset for every iteration. It
> seems that Sobol sequences require generating primitive polynomials of
> large degree. What is better: a higherdimensional B.O., or a
> lowerdimensional one combined with parameter sampling? Probably the
> latter. By the way, in cases where parameters feed into heuristics, there
> may be considerable independence across the set of parameters, especially
> when conditioned by a specific dataset record. Each heuristic targets
> certain situations that arise in some records. Not sure how to take
> advantage of this.
>
> Thanks,
> Sasha
>
>
>
> From: Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
> To: Alexandre V Evfimievski <evfimi@us.ibm.com>, npansar@us.ibm.com,
> dev@systemml.apache.org
> Date: 08/10/2017 09:39 AM
>
> Subject: Re: Bayesian optimizer support for SystemML.
> 
>
>
>
> Hi Sasha,
>
> And one more thing, I would like to ask, what are you thinking about
> `sobol` function. What is the dimension requirement and pattern of
> sampling?. Please help me understand, what are the tasks exactly that we
> are going to optimize, in SystemML.
>
> Surrogate slice sampling  What are your thoughts about it.
>
> Thank you very much,
> Janardhan
>
> On Wed, Jul 26, 2017 at 12:25 AM, Alexandre V Evfimievski <
> *evfimi@us.ibm.com* <evfimi@us.ibm.com>> wrote:
> Hi, Janardhan,
>
> We are still studying Bayesian Optimization (B.O.), you are ahead of us!
> Just one comment: The "black box" loss function that is being optimized is
> not always totally black. Sometimes it is a sum of many small blackbox
> functions. Suppose we want to train a complex system with many parameters
> over a large dataset. The system involves many heuristics, and the
> parameters feed into these heuristics. We want to minimize a loss
> function, which is a sum of individual losses per each data record. We
> want to use B.O. to find an optimal vector of parameters. The parameters
> affect the system's behavior in complex ways and do not allow for the
> computation of a gradient. However, because the loss is a sum of many
> losses, when running B.O., we have a choice: either to run each test over
> the entire dataset, or to run over a small sample of the dataset (but try
> more parameter vectors per hour, say). The smaller the sample, the higher
> the variance of the loss. Not sure which implementation of B.O. is the
> best to handle such a case.
>
> Thanks,
> Alexandre (Sasha)
>
>
>
> From: Janardhan Pulivarthi <*janardhan.pulivarthi@gmail.com*
> <janardhan.pulivarthi@gmail.com>>
> To: *dev@systemml.apache.org* <dev@systemml.apache.org>
> Date: 07/25/2017 10:33 AM
> Subject: Re: Bayesian optimizer support for SystemML.
> 
>
>
>
> Hi Niketan and Mike,
>
> As we are trying to implement this Bayesian Optimization, should we take
> input from more committers as well as this optimizer approach seems to have
> a couple of ways to implement. We may need to find out which suits us the
> best.
>
> Thanks,
> Janardhan
>
> On Sat, Jul 22, 2017 at 3:41 PM, Janardhan Pulivarthi <
> *janardhan.pulivarthi@gmail.com* <janardhan.pulivarthi@gmail.com>> wrote:
>
> > Dear committers,
> >
> > We will be planning to add bayesian optimizer support for both the ML and
> > Deep learning tasks for the SystemML. Relevant jira link:
> > *https://issues.apache.org/jira/browse/SYSTEMML979*
> <https://issues.apache.org/jira/browse/SYSTEMML979>
> >
> > The following is a simple outline of how we are going implement it.
> Please
> > feel free to make any kind of changes. In this google docs link:
> > *http://bit.do/systemmlbayesian* <http://bit.do/systemmlbayesian>
> >
> > Description:
> >
> > Bayesian optimization is a sequential design strategy for global
> > optimization of blackbox functions that doesnâ€™t require derivatives.
> >
> > Process:
> >
> > 1.
> >
> > First we select a point that will be the best as far as the no. of
> > iterations that has happened.
> > 2.
> >
> > Candidate point selection with sampling from Sobol quasirandom
> > sequence generator the space.
> > 3.
> >
> > Gaussian process hyperparameter sampling with surrogate slice sampling
> > method.
> >
> >
> > Components:
> >
> > 1.
> >
> > Selecting the next point to Evaluate.
> >
> > [image: nextpoint.PNG]
> >
> > We specify a uniform prior for the mean, m, and width 2 tophat priors
> for
> > each of the D length scale parameters. As we expect the observation noise
> > generally to be close to or exactly zero, v(nu) is given a horseshoe
> > prior. The covariance amplitude theta0 is given a zero mean, unit
> variance
> > lognormal prior, theta0 ~ ln N (0, 1).
> >
> >
> >
> > 1.
> >
> > Generation of QuasiRandom Sobol Sequence.
> >
> > Which kind of sobol patterns are needed?
> >
> > [image: sobol patterns.PNG]
> >
> > How many dimensions do we need?
> >
> > This paper argues that its generation target dimension is 21201. [pdf
> link
> > <
> *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf*
> <https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf>
> >
> > ]
> >
> >
> >
> > 1.
> >
> > Surrogate Slice Sampling.
> >
> > [image: surrogate data sampling.PNG]
> >
> >
> > References:
> >
> > 1. For the next point to evaluate:
> >
> > *https://papers.nips.cc/paper/4522practicalbayesian*
> <https://papers.nips.cc/paper/4522practicalbayesian>
>
> > optimizationofmachinelearningalgorithms.pdf
> >
> >
> *http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf*
> <http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf>
> >
> >
> > 2. QuasiRandom Sobol Sequence Generator:
> >
> > *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%*
> > 20constructing.pdf
> >
> >
> > 3. Surrogate Slice Sampling:
> >
> > *http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf*
> <http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf>
> >
> >
> >
> > Thank you so much,
> >
> > Janardhan
> >
> >
> >
> >
>
>
>
>
>
>
>
