mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Perry <mikeperrycan...@gmail.com>
Subject Re: Sparse Vectors
Date Sun, 21 Nov 2010 14:28:32 GMT
Thanks Ted for the answer.

"Should be sparse, but I can't say for sure."

Could anybody confirm? in the quickstart-kmeans.sh script there's a step to
convert the data to SequenceFile format (seqdirectory) and then
a second step to convert the SequenceFiles to sparse vector format (
seq2sparse). That's why I'm asking.


On Sat, Nov 20, 2010 at 3:45 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Sat, Nov 20, 2010 at 8:47 AM, Mike Perry <mikeperrycanada@gmail.com
> >wrote:
>
> > Hello all,
> >
> > Does the script to convert a Lucene index to Mahout vectors write
> sequence
> > files in sparse vector representation? my impression is that it doesn't
> but
> > I want to verify that.
> >
>
> Should be sparse, but I can't say for sure.
>
>
> > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to
> > sparse format (I know about the seq2sparse option). Could someone point
> out
> > where in the code it actually constructs the sparse vectors?  it seems to
> > me
> > that one of the methods in DictionaryVectorizer generates the vectors but
> I
> > couldn't
> > find where exactly.
> >
>
> Look for VectorWritable.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message