lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rebecca Watson <bec.wat...@gmail.com>
Subject Re: vector model usage
Date Wed, 02 Jun 2010 05:37:32 GMT
Hi,

if you want to store word+value pairs then use lucene scoring to weight
the words with higher vaules against them, you should look at using payloads
and the DelimitedPayloadTokenFilter which lets you specify e.g.
word1|value1 word2|value2 ...
and the values are stored as payloads against the word tokens (which can
still be analyzed using stemming etc if you put the payload filter at the
beginning of the tokenizer stream set).

then at query time you'd need to use a payload query type to get the weights
included in the scoring for docs --
see the article by lucid imagination here:
http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/

not sure if that's the sort of thing you're after?

bec :)

On 2 June 2010 04:29, Dionisis Koumouras <kumdio@gmail.com> wrote:
> Thanks for your reply Grant.
> I checked out the TokenStream class and you are right but I'm afraid I
> didn't really make myself understood. What I want is to be able to create a
> Document out of key-value pairs of terms and float numbers representing word
> weights, insert the Document in the index and then use the lucene scoring
> mechanism to retrieve the entries.
> Do you find this feasible?
>
> On Tue, Jun 1, 2010 at 8:35 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>
>>
>> On May 31, 2010, at 6:25 AM, Dionisis Koumouras wrote:
>>
>> > Hi all,
>> > I'm new to lucene but have used it succesfully for a few simple tasks.
>> >
>> > I am experimenting with the vector space representation of documents and
>> > have managed to store and retrieve TermFreqVector objects.
>> >
>> > The question is whether it is possible to directly add vector space
>> > representations of documents to an index. I can't find any way to create
>> a
>> > document field from a TermFreqVector object.
>>
>> The Field constructor can take in a TokenStream (i.e. a preanalyzed stream)
>> which you could easily back with a TermFreqVector.
>>
>> >
>> > This is the use case behind the question: retrieve some documents from
>> the
>> > index, cluster them, and store the vector space representations of the
>> > clusters back to the index.
>> >
>> > Dionisis
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message