lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Jones ...@last.fm>
Subject Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)
Date Wed, 02 Nov 2005 13:10:18 GMT
Hi,
I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for 
various things, and i've run into a situation that seems somewhat inelegant 
regarding populating fields which i already know the termvector for.

I'm creating a document for each user (last.fm tracks music taste for people), 
with a field that depicts a users favourite 500 artists. Each artist is 
represented by an integer, here's a simple example with 3 artists:

If i've listened to Radiohead (id 1) 10 times, Coldplay (id 2) 5 times and 
Beck (id 3) 2 times, the field would look like this "1 1 1 1 1 1 1 1 1 1 2 2 
2 2 2 3 3"

I use this index for quickly finding "top fans" of an artist or combination of 
artists, comparing peoples music taste and other things on the fly. 

The issue is that i already have the termvecor (radiohead=10, coldplay=5, 
beck=2) handy as a hashtable, and i've found myself building up a string of 
numbers separated by spaces as shown above, then feeding this into lucene (i 
store the termvec of the field in lucene).  Is there a way i could pass a 
termvector directly to lucene to cut out the ugly "turn it into a string and 
let lucene parse it" step? basically i want to provide the termvector for a 
field when inserting a new document, rather than let lucene build it by 
analyzing a string.

This does feel like a rather perverted use of lucene i suppose.. It's faster 
and less hassle than other methods i've tried to date though.

Regards,
RJ







---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message