Hi Sasha,

I believe, when slice samplin= g, if the slice is not narrow enough as shown in the left side graphs, ther= e is a possibility that we are going to escape this region of the objective= function. Please find the fig. 1 of the attached paper (if you haven't= seen). So, after a good number of runs the slice sampling doesn't seem= to improve but at `t =3D 60` the custom algorithm discussed in the paper s= eems to give a good result [ right side graphs].

N= ot sure whether we have this kind of objective functions in our algorithms = !

=E2=80=8B

Thank you very much,=C2=A0
Janardhan

On Tue,= Sep 12, 2017 at 2:02 PM, Janardhan Pulivarthi wrote:
<= div dir=3D"ltr">Hi Sasha,

1. According to clause 8.2.2 i= n paper attached, the author recommends lower-dimensional B.O.=C2=A0
<= div>2. It seems in most of the cases the small dimension for the sobol, see= ms to be sufficient.
3. About considering parameters independence= which feed into heuristics, I dropped a mail to Prof. Ryan P. Adams= and hoping a response soon.

I am implementing a p= reliminary script as Niketan pointed out and let you know, once I complete = the skeleton.

Thanks,
Janardhan

On Wed, Aug 23, 2017 at 4:04 PM, Alexandre V Evfimievski wrote:
Hi Janardhan,

The number of parameters could be rather large, that's certainly an issue for Bayesian Optimization.=C2=A0 A per= fect implementation would, perhaps, pick a sample of parameters and a sample of the dataset for every iteration.=C2=A0 It seems that Sobol sequences require generating primitive polynomials of large degree.=C2=A0 What is better: a higher-dimensional B.O., or a lower-dimensional one combined with parameter sampling?=C2=A0 Probably the latter.=C2=A0 By the way, in cases where parameters feed into heuristics, there may be considerable independence across the set of parameters, especially when conditioned by a specific dataset record.=C2=A0 Each heuristic targets certain situatio= ns that arise in some records.=C2=A0 Not sure how to take advantage of this.

Thanks,
Sasha

From: =C2=A0 =C2=A0 =C2=A0 =C2=A0Janardhan Pulivarthi <jan= ardhan.pulivarthi@gmail.com>
To: =C2=A0 =C2=A0 =C2=A0 =C2=A0Alexandre V Evfimievski <evfimi@us.ibm.co= m>, npansar@= us.ibm.com, dev@systemml.apache.org
Date: =C2=A0 =C2=A0 =C2=A0 =C2=A008/10/2017 09:39 AM

Subjec= t: =C2=A0 =C2=A0 =C2=A0 =C2=A0Re: Bayesian optimizer support for SystemML.

Hi Sasha,

And one more thing, I woul= d like to ask, what are you thinking about `sobol` function. What is the dimension requirement and pattern of sampling?. Please help me understand, what are the tasks exactly that we are going to optimize, in SystemML.

Thank you very much,
Janardhan

On Wed, Jul 26, 2017 at 1= 2:25 AM, Alexandre V Evfimievski <evfimi@us.ibm.com= > wrote:
Hi, Janardhan,=

We are= still studying Bayesian Optimization (B.O.), you are ahead of us!=C2=A0 Just one comment: =C2=A0The "black box" loss function that is being optimized is not always totally black.=C2=A0 Sometimes it is a sum of many small black-box functions.=C2=A0 Suppose we want to train a complex system with many parameters over a large dataset.=C2=A0 The system involves many heuristics, and the parameters feed into these heuristics.=C2=A0 We want to minimize a loss function, which is a sum of individual losses per each data record.=C2=A0 We want to use B.O. to find an optimal vector of parameters.=C2=A0 The parameters affect the system's behavior in comple= x ways and do not allow for the computation of a gradient.=C2=A0 However, because the loss is a sum of many losses, when running B.O., we have a choice: either to run each test over the entire dataset, or to run over a small sample of the dataset (but try more parameter vectors per hour, say).=C2=A0 The smaller the sample, the higher the variance of the loss.=C2= =A0 Not sure which implementation of B.O. is the best to handle such a case.

Th= anks,
Alexandre (Sasha)

From: =C2=A0 =C2=A0 = =C2=A0 =C2=A0
Janardhan Pulivarthi <janar= dhan.pulivarthi@gmail.com>To: =C2=A0 =C2=A0 =C2=A0 =C2=A0dev@systemml.apache.org
Date: =C2=A0 =C2=A0 =C2=A0 =C2=A0
07/25/2017 10:33 AM
S= ubject: =C2=A0 =C2=A0 =C2=A0 =C2=A0
Re: Bayesian optimizer support for SystemML.
=

Hi Nik= etan and Mike,

As we are trying to implement this Bayesian Optimizat= ion, should we take
input from more committers as well as this optimizer= approach seems to have
a couple of ways to implement. We may need to find out which suits = us the
best.

Thanks,
Janardhan

On Sat, Jul 22, 2017 at = 3:41 PM, Janardhan Pulivarthi <

janardhan.pul= ivarthi@gmail.com> wrote:

> Dear committers,
>
> We will be planning to = add bayesian optimizer support for both the ML and
https://issu= es.apache.org/jira/browse/SYSTEMML-979
>
> The following is a simple outline of how we are g= oing implement it. Please
> feel free to make any kind of changes. In this google docs l= ink:
>
http://bit.do/systemml-b= ayesian
>
> Descr= iption:
>
> Bayesian optimization is a sequential design strate= gy for global
> optimization of black-box functions that doesn=E2=80= =99t require derivatives.
>
> Process:
>
> =C2=A0 = =C2=A01.
>
> =C2=A0 =C2=A0First we select a point that will be = the best as far as the no. of
> =C2=A0 =C2=A0iterations that has happened.
> = =C2=A0 =C2=A02.
>
> =C2=A0 =C2=A0Candidate point selection with= sampling from Sobol quasirandom
> =C2=A0 =C2=A0sequence generator th= e space.
> =C2=A0 =C2=A03.
>
> =C2=A0 =C2=A0Gaussian proc= ess hyperparameter sampling with surrogate slice sampling
> =C2=A0 =C2=A0method.
>
>
> Compone= nts:
>
> =C2=A0 =C2=A01.
>
> =C2=A0 =C2=A0Selecting= the next point to Evaluate.
>
> [image: nextpoint.PNG]
>=
> We specify a uniform prior for the mean, m, and width 2 top-hat pr= iors for
> each of the D length scale parameters. As we expect the observa= tion noise
> generally to be close to or exactly zero, v(nu) is given a ho= rseshoe
> prior. The covariance amplitude theta0 is given a zero mean= , unit variance
> lognormal prior, theta0 ~ ln N (0, 1).
>
>
= >
> =C2=A0 =C2=A01.
>
> =C2=A0 =C2=A0Generation of Qua= siRandom Sobol Sequence.
>
> Which kind of sobol patterns are n= eeded?
>
> [image: sobol patterns.PNG]
>
> How many= dimensions do we need?
>
> This paper argues that its generati= on target dimension is 21201. [pdf link
> <
https://researchcommons.waikato.ac= .nz/bitstream/handle/10289/967/Joe%20constructing.pdf<= /tt>>
> ]
>
>
>
>= =C2=A0 =C2=A01.
>
> =C2=A0 =C2=A0Surrogate Slice Sampling.
= >
> [image: surrogate data sampling.PNG]
>
>
> R= eferences:
>
> 1. For the next point to evaluate:
>
&g= t;
https://pape= rs.nips.cc/paper/4522-practical-bayesian-
<= font size=3D"2">
> optimization-of-machine-learning-algorithms.p= df
>
> =C2=A0
http://www.dmi.usherb.ca/~larocheh/publica= tions/gpopt_nips_appendix.pdf=
>
>
> 2. QuasiRandom Sobol Sequence Generator:
>>
https://researchcommons.waikato.ac.nz/bitstream/handle/10= 289/967/Joe%
> 20constructin= g.pdf
>
>
> 3. Surrogate Slice Sampling:
>
> =
http:/= /homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf
>
>
>
> Thank you = so much,
>
> Janardhan
>
>
>
>