Thanks Ted for the answer.
"Should be sparse, but I can't say for sure."
Could anybody confirm? in the quickstart-kmeans.sh script there's a step to
convert the data to SequenceFile format (seqdirectory) and then
a second step to convert the SequenceFiles to sparse vector format (
seq2sparse). That's why I'm asking.
On Sat, Nov 20, 2010 at 3:45 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Sat, Nov 20, 2010 at 8:47 AM, Mike Perry <mikeperrycanada@gmail.com
> >wrote:
>
> > Hello all,
> >
> > Does the script to convert a Lucene index to Mahout vectors write
> sequence
> > files in sparse vector representation? my impression is that it doesn't
> but
> > I want to verify that.
> >
>
> Should be sparse, but I can't say for sure.
>
>
> > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to
> > sparse format (I know about the seq2sparse option). Could someone point
> out
> > where in the code it actually constructs the sparse vectors? it seems to
> > me
> > that one of the methods in DictionaryVectorizer generates the vectors but
> I
> > couldn't
> > find where exactly.
> >
>
> Look for VectorWritable.
>
|