commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-817) Multivariate Normal Mixture Model Fitting by Expectation Maximization
Date Tue, 13 Nov 2012 11:24:11 GMT

    [ https://issues.apache.org/jira/browse/MATH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496118#comment-13496118
] 

Gilles commented on MATH-817:
-----------------------------

bq. [...] public functions after initialization [...] 

Sorry for the misunderstanding, but again that's not what I mean. Those public functions would
help the user to set up the necessary arguments _before_ initialization. It's _not_ the business
of the optimization algorithm to figure out the initial guesses: whether those are chosen
carefully or randomly or estimated from sample data, the algorithm, as such, starts its actual
work with a fully defined mixture of Gaussian distributions.

A lean API also makes for clearer, more maintainable code; so we should strive to have some
algorithm's implementation focus on its job. Helper utilities, such as initialization by estimation,
or randomization or from incomplete specification, can come later, e.g. as subclasses or as
utility functions.

bq. A method called setInitialMeans(double[][] initialMeans) [...]

Please don't do that. The steps (i.e. construction, initialization, number of methods calls)
needed to perform some action and get a reliable result should be as few as possible, and
among other things, it is better to consider that construction _is_ initialization, thereby
removing the need of an additional initialization step (and the risk that this step if forgotten
during usage).

Of course, there are mixed cases where there is no clear-cut separation between data that
can be fixed at construction and arguments that can be passed to the object's methods.

A typical example would be the number of components which the fitted mixture should contain.
Is it a parameter to be fixed at the fitter's construction?
{code}
public class EMFitter1 {
  final int numberOfComponents;

  public EMFitter1(int numComp) {
    numberOfComponents = numComp;
  }

  public MixtureMultivariateRealDistribution<MultivariateNormalDistribution> fit(double[][]
data) {
    // Fit a mixture with "numberOfComponents" components.
  }
}
{code}
Or it could be an additional argument to the fit method?
{code}
public class EMFitter2 {
  public EMFitter2() {}

  public MixtureMultivariateRealDistribution<MultivariateNormalDistribution> fit(int
numComp,
                                                                                 double[][]
data) {
    // Fit a mixture with "numComp" components.
  }
}
{code}
In this latter case, the rationale would be that the number of components is a "parameter"
of the algorithm that should not require a new object.
But note that it is a matter of interpretation: in the case of "EMFitter1", there is an equally
valid rationale in saying that an instance of the fitter encapsulates the fitting by a fixed
number of components!

bq. [...] it may be OK and make for a clearer API to have some public members allowing specification
of various initial estimates.

This approach already has the problem of letting users wonder what happens with the "initial
covariances" when they call "setInitialMeans": Are the covariances set to random values, or
kept to their previous values? What happens if there are no previous values?
Another problem is that it adds a number of steps and make the API more susceptible to wrong
usage.
If you want to allow for multiple calls to "fit" with different parameters, we might want
to use to same approach as we are implementing in the "optimization" package (with the new
interface "OptimizationData"). Could you please have a look? [But please note that we went
that "far"[1] in order to accommodate various algorithms that needed _different_ parameter
types within the same API.]

[1] "OptimizationData" is just marker interface (i.e. with no functionality) and that's not
something to be abused too much.
                
> Multivariate Normal Mixture Model Fitting by Expectation Maximization
> ---------------------------------------------------------------------
>
>                 Key: MATH-817
>                 URL: https://issues.apache.org/jira/browse/MATH-817
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Jared Becksfort
>            Priority: Minor
>         Attachments: AbstractMultivariateRealDistribution.java.patch, MixtureMultivariateRealDistribution.java.patch,
MultivariateNormalDistribution.java.patch, MultivariateNormalMixtureExpectationMaximizationFitter.java,
MultivariateNormalMixtureExpectationMaximizationFitterTest.java
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I will submit a class for fitting Multivariate Normal Mixture Models using Expectation
Maximization.
> > Hello,
> >
> > I have implemented some classes for multivariate Normal distributions, multivariate
normal mixture models, and an expectation maximization fitting class for the mixture model.
 I would like to submit it to Apache Commons Math.  I still have some touching up to do so
that they fit the style guidelines and implement the correct interfaces.  Before I do so,
I thought I would at least ask if the developers of the project are interested in me submitting
them.
> >
> > Thanks,
> > Jared Becksfort
> Dear Jared,
> Yes, that would be very nice to have such an addition! Remember to also include unit
tests (refer to the current ones for examples). The best would be to split a submission up
into multiple minor ones, each covering a natural submission (e.g. multivariate Normal distribution
in one submission), and create an issue as described at http://commons.apache.org/math/issue-tracking.html
.
> If you run into any problems, please do not hesitate to ask on this mailing list.
> Cheers, Mikkel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message