Not sure if this is feasible, but is there someway you could use a
"fake" analyzer that you constructed using your hashtable/termvector and
then have it output the tokens directly from the hashtable via the
TokenStream? Maybe you would have to pass in an empty/dummy string to
the field constructor. At least you wouldn't have to create the string
of your term vectors
Just a thought,
Grant
Richard Jones wrote:
>Hi,
>I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for
>various things, and i've run into a situation that seems somewhat inelegant
>regarding populating fields which i already know the termvector for.
>
>I'm creating a document for each user (last.fm tracks music taste for people),
>with a field that depicts a users favourite 500 artists. Each artist is
>represented by an integer, here's a simple example with 3 artists:
>
>If i've listened to Radiohead (id 1) 10 times, Coldplay (id 2) 5 times and
>Beck (id 3) 2 times, the field would look like this "1 1 1 1 1 1 1 1 1 1 2 2
>2 2 2 3 3"
>
>I use this index for quickly finding "top fans" of an artist or combination of
>artists, comparing peoples music taste and other things on the fly.
>
>The issue is that i already have the termvecor (radiohead=10, coldplay=5,
>beck=2) handy as a hashtable, and i've found myself building up a string of
>numbers separated by spaces as shown above, then feeding this into lucene (i
>store the termvec of the field in lucene). Is there a way i could pass a
>termvector directly to lucene to cut out the ugly "turn it into a string and
>let lucene parse it" step? basically i want to provide the termvector for a
>field when inserting a new document, rather than let lucene build it by
>analyzing a string.
>
>This does feel like a rather perverted use of lucene i suppose.. It's faster
>and less hassle than other methods i've tried to date though.
>
>Regards,
>RJ
>
>
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
337 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|