mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel.mar...@gmail.com>
Subject Re: Text clustering with SVD
Date Mon, 30 Mar 2015 20:06:29 GMT
Lanczos has since been deprecated and will be removed in the upcoming
release, so please desist from using/suggesting Lanczos.


On Mon, Mar 30, 2015 at 3:00 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> I am not aware of _any_ scenario under which lanczos would be faster (see
> N. Halko's dissertation for comparisons), although admittedly i did not
> study all possible cases.
>
> having -k=100 is probably enough for anything.  I would not recommend
> running -q>0 for k>100 as it would become quite slow in power iterations
> step.
>
> to your other questions, e.g. U*sigma result output, see "overview and
> usage" link given here:
> http://mahout.apache.org/users/dim-reduction/ssvd.html
>
> On Mon, Mar 30, 2015 at 2:19 AM, Donni Khan <prince.donnii@googlemail.com>
> wrote:
>
> > Hallo Suneel,
> > Thanks for fast reply.
> > Is SSVD like SVD? which one is better?
> > I run the SSVD  by java code on my data, but how do I compute U*Sigma?
> Can
> > I do that by Mahout?
> > Is there optimal method to determin K?
> >
> > another quesion is how do I make the relation between ssvd output and
> > words dictionary(real words)?
> >
> > Thank you
> > Donni
> >
> > On Mon, Mar 30, 2015 at 10:04 AM, Suneel Marthi <suneel.marthi@gmail.com
> >
> > wrote:
> >
> > > Here are the steps if u r using Mahout-mrlegacy in the present Mahout
> > > trunk:
> > >
> > > 1. Generate tfidf vectors from the input corpus using seq2sparse (I am
> > > assuming you had done this before and hence avoiding the details)
> > >
> > > 2. Run SSVD on the generated tfidf vectors from (1)
> > >
> > >       ./bin/mahout ssvd -i <tfidf vectors> -o <svd output> -k 80
-pca
> > true
> > > -us true -U false -V false
> > >
> > >      k = no. of reduced basis vectors
> > >
> > >     You would need the U*Sigma output of the PCA flow for the next
> > > clustering step
> > >
> > > 3. Run KMeans (or any other clustering algo) with the U*Sigma from (2)
> as
> > > input.
> > >
> > >
> > > On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan <
> > prince.donnii@googlemail.com>
> > > wrote:
> > >
> > > > Hallo Mahout users,
> > > >
> > > > I'm working on text clustering, I would like to reduce the features
> to
> > > > enhance the clustering process.
> > > > I would like to use  the Singular Value Decomposition before
> cluatering
> > > > process. I will be thankfull if anyone has used this before, Is it a
> > good
> > > > idea for clustering?
> > > > Is there any other method in mahout to reduce the text features
> before
> > > > clustring?
> > > > Is anyone has idea how can I apply SVD by using Java code?
> > > >
> > > > Thanks in advance,
> > > > Donni
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message