mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Why is Lanczos deprecated?
Date Fri, 02 Aug 2013 17:54:21 GMT
the only time you would not get good results is if spectrum does not have a
good decay. Which is equivalent to mostly same variance in most of original
basis directions. This problem is similar to problem that arises with PCA
when you try to do dimensionality reduction with retaining certain %-tage
of variance. in case of flat spectrum decay, you'd need much bigger k to
retain same amount of variance in dimensionally reduced projection. In that
sense SSVD solution for a given k is as good as PCA gets for the same k.
Also, i believe (but not 100% sure) "problems too small" exhibit higher
errors due to the law of large numbers.


On Fri, Aug 2, 2013 at 10:41 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> if you use k > 40 you are already beating Lanczos for larger datasets.
> k>10 is unlikely meaninful. p need not be more than 15% of k (default is
> 15). use q=1, q>1 does not yield tangible improvements in real world.
>  Again, see Nathan Halko's dissertation on accuracy comparison.
>
>
>
> On Fri, Aug 2, 2013 at 4:17 AM, Fernando Fernández <
> fernando.fernandez.gonzalez@gmail.com> wrote:
>
>> Keeping Lanczos would be nice, Like I said, it's currently being used in
>> some projects with good results and I think it's easier to tune so it
>> would
>> be my first choice for future developments. I still need to further test
>> SSVD, specially because in the current example I'm working it yields very
>> different results from Lanczos. We are investigating if it can be due to a
>> bug when loading the data, though dimensions of the ouptut seem ok, or if
>> it's a question of increasing p or q parameters. If it's a question of
>> increasing p and q I think running times would make SSVD not viable. I
>> hope
>> to be able to provide some comparison figures in terms of precision and
>> running time in a month or so.
>>
>> I hope that other users reads this and say wether they are using Lanczos.
>>
>> Best,
>> Fernando.
>>
>> 2013/8/2 Sebastian Schelter <ssc@apache.org>
>>
>> > I would also be fine with keeping if there is demand. I just proposed to
>> > deprecate it and nobody voted against that at that point in time.
>> >
>> > --sebastian
>> >
>> >
>> > On 02.08.2013 03:12, Dmitriy Lyubimov wrote:
>> > > There's a part of Nathan Halko's dissertation referenced on algorithm
>> > page
>> > > running comparison.  In particular, he was not able to compute more
>> than
>> > 40
>> > > eigenvectors with Lanczos on wikipedia dataset. You may refer to that
>> > > study.
>> > >
>> > > On the accuracy part, it was not observed that it was a problem,
>> assuming
>> > > high level of random noise is not the case, at least not in LSA-like
>> > > application used there.
>> > >
>> > > That said, i am all for diversity of tools, I would actually be +0 on
>> > > deprecating Lanczos, it is not like we are lacking support for it.
>> SSVD
>> > > could use improvements too.
>> > >
>> > >
>> > > On Thu, Aug 1, 2013 at 3:15 AM, Fernando Fernández <
>> > > fernando.fernandez.gonzalez@gmail.com> wrote:
>> > >
>> > >> Hi everyone,
>> > >>
>> > >> Sorry if I duplicate the question but I've been looking for an answer
>> > and I
>> > >> haven't found an explanation other than it's not being used (together
>> > with
>> > >> some other algorithms). If it's been discussed in depth before maybe
>> you
>> > >> can point me to some link with the discussion.
>> > >>
>> > >> I have successfully used Lanczos in several projects and it's been
a
>> > >> surprise to me finding that the main reason (according to what I've
>> read
>> > >> that might not be the full story) is that it's not being used. At the
>> > >> begining I supposed it was because SSVD is supposed to be much faster
>> > with
>> > >> similar results, but after making some tests I have found that
>> running
>> > >> times are similar or even worse than lanczos for some configurations
>> (I
>> > >> have tried several combinations of parameters, given child processes
>> > enough
>> > >> memory, etc. and had no success in running SSVD at least in 3/4 of
>> time
>> > >> Lanczos runs, thouh they might be some combinations of parameters I
>> have
>> > >> still not tried). It seems to be quite tricky to find a good
>> > combination of
>> > >> parameters for SSVD and I have seen also a precision loss in some
>> > examples
>> > >> that makes me not confident in migrating Lanczos to SSVD from now on
>> > (How
>> > >> far can I trust results from a combination of parameters that runs
in
>> > >> significant less time, or at least a good time?).
>> > >>
>> > >> Can someone convince me that SSVD is actually a better option than
>> > Lanczos?
>> > >> (I'm totally willing to be convinced... :) )
>> > >>
>> > >> Thank you very much in advance.
>> > >>
>> > >> Fernando.
>> > >>
>> > >
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message