lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dionisis Koumouras <kum...@gmail.com>
Subject Re: vector model usage
Date Tue, 08 Jun 2010 10:49:38 GMT
Hi,
this example is really close to what I'm trying to do but unfortunately it
uses a lot of classes that are outdated in version 2.9.2 that I'm currently
using.

Actually, it uses text in which boosts follow terms (delimited by a special
char), parses the text and then adds documents to the index.

I would like to do the same thing, but building the terms programmatically,
without a text parser. I guess I should study the termAttribute API stuff.

thanks bec!


On Wed, Jun 2, 2010 at 8:37 AM, Rebecca Watson <bec.watson@gmail.com> wrote:

> Hi,
>
> if you want to store word+value pairs then use lucene scoring to weight
> the words with higher vaules against them, you should look at using
> payloads
> and the DelimitedPayloadTokenFilter which lets you specify e.g.
> word1|value1 word2|value2 ...
> and the values are stored as payloads against the word tokens (which can
> still be analyzed using stemming etc if you put the payload filter at the
> beginning of the tokenizer stream set).
>
> then at query time you'd need to use a payload query type to get the
> weights
> included in the scoring for docs --
> see the article by lucid imagination here:
>
> http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/
>
> not sure if that's the sort of thing you're after?
>
> bec :)
>
> On 2 June 2010 04:29, Dionisis Koumouras <kumdio@gmail.com> wrote:
> > Thanks for your reply Grant.
> > I checked out the TokenStream class and you are right but I'm afraid I
> > didn't really make myself understood. What I want is to be able to create
> a
> > Document out of key-value pairs of terms and float numbers representing
> word
> > weights, insert the Document in the index and then use the lucene scoring
> > mechanism to retrieve the entries.
> > Do you find this feasible?
> >
> > On Tue, Jun 1, 2010 at 8:35 PM, Grant Ingersoll <gsingers@apache.org>
> wrote:
> >
> >>
> >> On May 31, 2010, at 6:25 AM, Dionisis Koumouras wrote:
> >>
> >> > Hi all,
> >> > I'm new to lucene but have used it succesfully for a few simple tasks.
> >> >
> >> > I am experimenting with the vector space representation of documents
> and
> >> > have managed to store and retrieve TermFreqVector objects.
> >> >
> >> > The question is whether it is possible to directly add vector space
> >> > representations of documents to an index. I can't find any way to
> create
> >> a
> >> > document field from a TermFreqVector object.
> >>
> >> The Field constructor can take in a TokenStream (i.e. a preanalyzed
> stream)
> >> which you could easily back with a TermFreqVector.
> >>
> >> >
> >> > This is the use case behind the question: retrieve some documents from
> >> the
> >> > index, cluster them, and store the vector space representations of the
> >> > clusters back to the index.
> >> >
> >> > Dionisis
> >>
> >> --------------------------
> >> Grant Ingersoll
> >> http://www.lucidimagination.com/
> >>
> >> Search the Lucene ecosystem using Solr/Lucene:
> >> http://www.lucidimagination.com/search
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message