lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: unique term identifiers
Date Tue, 19 Jan 2010 19:03:47 GMT
Have a look at Mahout (Lucene sister project), which can create SparseVectors from Lucene term
vectors where the entries are the term id and the "weight" of the term.  Trivial to replicate
what is done in Mahout for LibSVM or ARFF or whatever.

On Jan 18, 2010, at 9:07 AM, Solt, Ill├ęs wrote:

> Hi,
> I am looking for a way to represent term frequency data in a vector space, thus using
unique integer identifiers instead of string. This would allow feeding tools like LIBSVM from
a Lucene index.
> A small example: TermFreqVector.toString() produces "{TITLE: one/3, two/4}". What I am
looking for is "1:3 2:4", where 1 and 2 are arbitrary identifiers, sortedness is not an issue.
> The task can obviously be solved using some java Map, but it should be less efficient
then using native Lucene methods.
> I am using 2.9.1, my index can be considered constant.
> Thanks,
> Illes Solt
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

Search the Lucene ecosystem using Solr/Lucene:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message