lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel <>
Subject Re: How to build your custom termfreq vector an add it to the field ?
Date Fri, 09 Nov 2007 02:11:03 GMT
Very interesting the link you suggest me Mr Grant Ingersoll.
Let see if I understand how the ranking issue in lucene could be implemented:
1.	First I must create my own query class extending the abstract Query
class. The only method I must implement from this class is toString.
Is right this ???
2.	I must implement inside my own query class the Weight interface
But I really don't understand how this is going to let me change my
ranking scoring.
3 I must implement my custom Scorer ???
I don't know how integrate this. There is a lot of little pieces of
information but not concrete.

On Nov 7, 2007 1:48 PM, Grant Ingersoll <> wrote:
> Term Vectors (specifically TermFreqVector) in Lucene are a storage
> mechanism for convenience and applications to use.  They are not an
> integral part of the scoring in the way you may be thinking of them in
> terms of the traditional Vector Space Model, thus there may be some
> confusion from the different usages of that terminology.  If you want
> to see examples of how to implement scorers have a look at classes
> like TermScorer, BoostingTermQuery, and any of the other classes that
> extend Scorer.  You might also find the file formats page (off of the
> Lucene Java website under Documentation) helpful for understanding
> what Lucene stores so that it can do scoring.
> There really isn't any tutorial on scoring, as it is not something
> that many people have expressed an interest in or no one has made it a
> high enough priority to write one.  Having written a Scorer (or maybe
> two, I forget) I can give advice on specific things, but I am not sure
> I could write a tutorial that is general enough to be useful at this
> point.
> One thought for associating a weight to a given term based on its
> cooccurring terms is to use the new Payload mechanism whereby you can
> store a byte array at each term which can then be used in scoring via
> things like the BoostingTermQuery (or your own implementation.)  If
> that is of interest, you can search the archives for payloads (I also
> think Michael Busch is presenting on Payloads, amongst other things,
> at ApacheCon in Atlanta) and have a look at the BoostingTermQuery.
> There certainly are other PayloadQueries that need to be implemented.
> See the Lucene wiki for some background and details on Payloads as well.
> I don't know that it is a big mistake to try this in Lucene.  The
> community hasn't put a huge priority on making altering the innards of
> scoring easier to deal with (if possible), but that doesn't mean we
> are not open to suggestions and patches.    You may find
>   to be informative for both the implementation and the discussion of
> things that need to happen to be accepted into Lucene.  This JIRA
> issue specifically attempts to provide Lucene with a new scoring
> mechanism.
> You might also have a look at Lemur (
> which is much more academically focused.
> Cheers,
> Grant
> On Nov 7, 2007, at 12:49 PM, Ariel wrote:
> > Then if I want to use another scoring formula I must to implement my
> > own Query/Weigh/Scorer  ? For example instead of cousine distance
> > leiderbage distance or .. another. I'm studying Query/Weigh/Scorer
> > classes to find out how to do that but there is not much documentation
> > about that.
> >
> > I have seen I could change similarity factors extending the simlarity
> > class, but I have not seen any example about changing scoring formula
> > and changing the weight by term in the term vector. Do you know any
> > tutorial about this ?
> >
> > What I want to do changing frecuency in the terms vector is this: for
> > example instead of take the tf term frecuency of the term and stored
> > in the vector I want to consider the correlation of the term with the
> > other terms of the documents and store that measure by term in the
> > vector so later with my custom similarity formula calculate the
> > ranking of a document against a query considering the correlation
> > between terms.
> > Dou you think is a big mistake try to do this with lucene ??? Is
> > there any way ?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > For additional commands, e-mail:
> >
> --------------------------
> Grant Ingersoll
> Lucene Boot Camp Training:
> ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!
> Lucene Helpful Hints:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message