lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Solt, Ill├ęs" <>
Subject unique term identifiers
Date Mon, 18 Jan 2010 14:07:35 GMT

 I am looking for a way to represent term frequency data in a vector 
space, thus using unique integer identifiers instead of string. This 
would allow feeding tools like LIBSVM from a Lucene index.

 A small example: TermFreqVector.toString() produces "{TITLE: one/3, 
two/4}". What I am looking for is "1:3 2:4", where 1 and 2 are arbitrary 
identifiers, sortedness is not an issue.

 The task can obviously be solved using some java Map, but it should be 
less efficient then using native Lucene methods.

 I am using 2.9.1, my index can be considered constant.

Illes Solt


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message