Hi Sasha,
I believe, when slice sampling, if the slice is not narrow enough as shown
in the left side graphs, there is a possibility that we are going to escape
this region of the objective function. Please find the fig. 1 of the
attached paper (if you haven't seen). So, after a good number of runs the
slice sampling doesn't seem to improve but at `t = 60` the custom algorithm
discussed in the paper seems to give a good result [ right side graphs].
Not sure whether we have this kind of objective functions in our algorithms
!
Thank you very much,
Janardhan
On Tue, Sep 12, 2017 at 2:02 PM, Janardhan Pulivarthi <
janardhan.pulivarthi@gmail.com> wrote:
> Hi Sasha,
>
> 1. According to clause 8.2.2 in paper attached, the author recommends
> lowerdimensional B.O.
> 2. It seems in most of the cases the small dimension for the sobol, seems
> to be sufficient.
> 3. About considering parameters independence which feed into heuristics, I
> dropped a mail to *Prof. Ryan P. Adams* and hoping a response soon.
>
> I am implementing a preliminary script as Niketan pointed out and let you
> know, once I complete the skeleton.
>
> Thanks,
> Janardhan
>
> On Wed, Aug 23, 2017 at 4:04 PM, Alexandre V Evfimievski <
> evfimi@us.ibm.com> wrote:
>
>> Hi Janardhan,
>>
>> The number of parameters could be rather large, that's certainly an issue
>> for Bayesian Optimization. A perfect implementation would, perhaps, pick a
>> sample of parameters and a sample of the dataset for every iteration. It
>> seems that Sobol sequences require generating primitive polynomials of
>> large degree. What is better: a higherdimensional B.O., or a
>> lowerdimensional one combined with parameter sampling? Probably the
>> latter. By the way, in cases where parameters feed into heuristics, there
>> may be considerable independence across the set of parameters, especially
>> when conditioned by a specific dataset record. Each heuristic targets
>> certain situations that arise in some records. Not sure how to take
>> advantage of this.
>>
>> Thanks,
>> Sasha
>>
>>
>>
>> From: Janardhan Pulivarthi <janardhan.pulivarthi@gmail.com>
>> To: Alexandre V Evfimievski <evfimi@us.ibm.com>,
>> npansar@us.ibm.com, dev@systemml.apache.org
>> Date: 08/10/2017 09:39 AM
>>
>> Subject: Re: Bayesian optimizer support for SystemML.
>> 
>>
>>
>>
>> Hi Sasha,
>>
>> And one more thing, I would like to ask, what are you thinking about
>> `sobol` function. What is the dimension requirement and pattern of
>> sampling?. Please help me understand, what are the tasks exactly that we
>> are going to optimize, in SystemML.
>>
>> Surrogate slice sampling  What are your thoughts about it.
>>
>> Thank you very much,
>> Janardhan
>>
>> On Wed, Jul 26, 2017 at 12:25 AM, Alexandre V Evfimievski <
>> *evfimi@us.ibm.com* <evfimi@us.ibm.com>> wrote:
>> Hi, Janardhan,
>>
>> We are still studying Bayesian Optimization (B.O.), you are ahead of us!
>> Just one comment: The "black box" loss function that is being optimized is
>> not always totally black. Sometimes it is a sum of many small blackbox
>> functions. Suppose we want to train a complex system with many parameters
>> over a large dataset. The system involves many heuristics, and the
>> parameters feed into these heuristics. We want to minimize a loss
>> function, which is a sum of individual losses per each data record. We
>> want to use B.O. to find an optimal vector of parameters. The parameters
>> affect the system's behavior in complex ways and do not allow for the
>> computation of a gradient. However, because the loss is a sum of many
>> losses, when running B.O., we have a choice: either to run each test over
>> the entire dataset, or to run over a small sample of the dataset (but try
>> more parameter vectors per hour, say). The smaller the sample, the higher
>> the variance of the loss. Not sure which implementation of B.O. is the
>> best to handle such a case.
>>
>> Thanks,
>> Alexandre (Sasha)
>>
>>
>>
>> From: Janardhan Pulivarthi <*janardhan.pulivarthi@gmail.com*
>> <janardhan.pulivarthi@gmail.com>>
>> To: *dev@systemml.apache.org* <dev@systemml.apache.org>
>> Date: 07/25/2017 10:33 AM
>> Subject: Re: Bayesian optimizer support for SystemML.
>> 
>>
>>
>>
>> Hi Niketan and Mike,
>>
>> As we are trying to implement this Bayesian Optimization, should we take
>> input from more committers as well as this optimizer approach seems to
>> have
>> a couple of ways to implement. We may need to find out which suits us the
>> best.
>>
>> Thanks,
>> Janardhan
>>
>> On Sat, Jul 22, 2017 at 3:41 PM, Janardhan Pulivarthi <
>> *janardhan.pulivarthi@gmail.com* <janardhan.pulivarthi@gmail.com>> wrote:
>>
>> > Dear committers,
>> >
>> > We will be planning to add bayesian optimizer support for both the ML
>> and
>> > Deep learning tasks for the SystemML. Relevant jira link:
>> > *https://issues.apache.org/jira/browse/SYSTEMML979*
>> <https://issues.apache.org/jira/browse/SYSTEMML979>
>> >
>> > The following is a simple outline of how we are going implement it.
>> Please
>> > feel free to make any kind of changes. In this google docs link:
>> > *http://bit.do/systemmlbayesian* <http://bit.do/systemmlbayesian>
>> >
>> > Description:
>> >
>> > Bayesian optimization is a sequential design strategy for global
>> > optimization of blackbox functions that doesn’t require derivatives.
>> >
>> > Process:
>> >
>> > 1.
>> >
>> > First we select a point that will be the best as far as the no. of
>> > iterations that has happened.
>> > 2.
>> >
>> > Candidate point selection with sampling from Sobol quasirandom
>> > sequence generator the space.
>> > 3.
>> >
>> > Gaussian process hyperparameter sampling with surrogate slice
>> sampling
>> > method.
>> >
>> >
>> > Components:
>> >
>> > 1.
>> >
>> > Selecting the next point to Evaluate.
>> >
>> > [image: nextpoint.PNG]
>> >
>> > We specify a uniform prior for the mean, m, and width 2 tophat priors
>> for
>> > each of the D length scale parameters. As we expect the observation
>> noise
>> > generally to be close to or exactly zero, v(nu) is given a horseshoe
>> > prior. The covariance amplitude theta0 is given a zero mean, unit
>> variance
>> > lognormal prior, theta0 ~ ln N (0, 1).
>> >
>> >
>> >
>> > 1.
>> >
>> > Generation of QuasiRandom Sobol Sequence.
>> >
>> > Which kind of sobol patterns are needed?
>> >
>> > [image: sobol patterns.PNG]
>> >
>> > How many dimensions do we need?
>> >
>> > This paper argues that its generation target dimension is 21201. [pdf
>> link
>> > <
>> *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf*
>> <https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf>
>> >
>> > ]
>> >
>> >
>> >
>> > 1.
>> >
>> > Surrogate Slice Sampling.
>> >
>> > [image: surrogate data sampling.PNG]
>> >
>> >
>> > References:
>> >
>> > 1. For the next point to evaluate:
>> >
>> > *https://papers.nips.cc/paper/4522practicalbayesian*
>> <https://papers.nips.cc/paper/4522practicalbayesian>
>>
>> > optimizationofmachinelearningalgorithms.pdf
>> >
>> >
>> *http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf*
>> <http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf>
>> >
>> >
>> > 2. QuasiRandom Sobol Sequence Generator:
>> >
>> > *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%*
>> > 20constructing.pdf
>> >
>> >
>> > 3. Surrogate Slice Sampling:
>> >
>> > *http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf*
>> <http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf>
>> >
>> >
>> >
>> > Thank you so much,
>> >
>> > Janardhan
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>>
>>
>
