mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <d...@apache.org>
Subject Re: Sparse Vectors
Date Sun, 21 Nov 2010 20:22:46 GMT
Per o.a.m.utils.vectors.lucene.TFDFMapper, which is called from
o.a.m.utils.vectors.lucene.Driver, the vectors created are instances
of RandomAccessSparseVector

On Sun, Nov 21, 2010 at 9:28 AM, Mike Perry <mikeperrycanada@gmail.com> wrote:
> Thanks Ted for the answer.
>
> "Should be sparse, but I can't say for sure."
>
> Could anybody confirm? in the quickstart-kmeans.sh script there's a step to
> convert the data to SequenceFile format (seqdirectory) and then
> a second step to convert the SequenceFiles to sparse vector format (
> seq2sparse). That's why I'm asking.
>
>
> On Sat, Nov 20, 2010 at 3:45 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> On Sat, Nov 20, 2010 at 8:47 AM, Mike Perry <mikeperrycanada@gmail.com
>> >wrote:
>>
>> > Hello all,
>> >
>> > Does the script to convert a Lucene index to Mahout vectors write
>> sequence
>> > files in sparse vector representation? my impression is that it doesn't
>> but
>> > I want to verify that.
>> >
>>
>> Should be sparse, but I can't say for sure.
>>
>>
>> > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to
>> > sparse format (I know about the seq2sparse option). Could someone point
>> out
>> > where in the code it actually constructs the sparse vectors?  it seems to
>> > me
>> > that one of the methods in DictionaryVectorizer generates the vectors but
>> I
>> > couldn't
>> > find where exactly.
>> >
>>
>> Look for VectorWritable.
>>
>

Mime
View raw message