mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasil Vasilev <vavasi...@gmail.com>
Subject Re: Incorrect calculation of pdf
Date Tue, 28 Jun 2011 13:01:33 GMT
In fact my idea was very simple, although I do not know if it will work OK:
Do all calculations on logarithmic level and just before return -
exponentiate the result. This will not change the function's expected result

On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Actually, pdf() should always be a pdf(), not a logPdf().  Many algorithms
> want one or the other.  Some don't much care because log is monotonic.  But
> we should do what the name implies.
>
> On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <jeastman@narus.com> wrote:
>
> > A better approach would be to create a new Model and ModelDistribution
> that
> > uses log arithmetic of your choosing. The initial models are very simple
> > minded and are likely not adequate for real applications.
> >
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: Monday, June 27, 2011 7:51 AM
> > To: user@mahout.apache.org
> > Subject: Re: Incorrect calculation of pdf
> >
> > There should not be a change to an existing method.
> >
> > It would be find to add another method, perhaps called logPdf, that does
> > what you suggest.  This loss of precision is common with the normal
> > distribution in high dimensions.
> >
> > On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <vavasilev@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Recently I wanted to use Dirichlet clustering algorithm to cluster
> > vectors
> > > directly taken out of vectorized text, whose dimensionality was around
> > > 50000. In this situation the algorithm fails to calculate the pdf of a
> > > vector corresponding to cluster center due to problems with numerical
> > > precision during multiplication.
> > >
> > > In this regard, what do you think of modifying the
> GaussianCluster.pdf()
> > > method in such way that it works with logarithmic probabilities?
> > >
> > > Regards, Vasil
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message