lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <...@il.ibm.com>
Subject Re: Scoring
Date Thu, 15 Jun 2006 10:03:12 GMT
One interesting thing to talk about is when you need to create a new Query
subclass, and how to do it.

For example, let's say you want something between a BooleanQuery and a
PhraseQuery, which matches documents with some of the query words in them
(like the normal BooleanQuery), but giving more score to documents which
contain these words near each other (there was a discussion about this idea
about a month ago, when we discussed short documents).

In that case, what do I need to do? I supposed I need to write a new Query
subclass, but what does doing this take? Do I need to write a "Scorer"? A
"Similarity"? Or what?

I think this is an interesting topic.

--
Nadav Har'El
nyh@il.ibm.com
+972-4-829-6326



                                                                           
             Grant Ingersoll                                               
             <gsingers@syr.edu                                             
             >                                                          To 
                                       java-dev@lucene.apache.org          
             15/06/2006 03:01                                           cc 
             AM                                                            
                                                                   Subject 
                                       Re: Scoring                         
             Please respond to                                             
             java-dev@lucene.a                                             
                 pache.org                                                 
                                                                           
                                                                           
                                                                           




Karl,

This is a great start.  I have also started a scoring.xml document under
the xdocs directory (in my sandbox).  So far, I have the following
sections (some even have content under them!):
1. Introduction  // Intro about Vector Space Model, some references to
theory, links to the Similarity scoring Formula
2. Scoring and the Index   //How scoring relates to what is in the index
(i.e. how it takes advantage of precomputed info such as norms, etc.
3. Understanding Similarity  //How the Similarity class fits into
Scoring and what it means to override the Similarity (Greek Kung Fu!)
4. Changing Your Scoring -- Expert // A discussion of
overriding/creating Scorer/Query/Whatever else
5. Class Diagrams // Links to your cool pictures
6. Sequence Diagrams //More cool pictures

What else is needed/useful?  Anyone want to volunteer on a section?

-Grant
karl wettin wrote:
> On Wed, 2006-06-07 at 08:27 -0400, Grant Ingersoll wrote:
>
>> I have started something in my sandbox that goes in the xdocs directory
>> that is going to cover the scoring and how it works (something parallel
>> in spirit to the file formats documentation).  Adding in sequence
>> diagrams and whatever you have would be a perfect fit.  I would be happy

>> to coordinate with you, as you may end up getting to it before me.
>>
>> I would also like to see, possibly, some package level documentation and

>> more javadocs.
>>
>
> Day (night) one of me getting to know the finding and scoring of the
> documents matching a query ended up with an initial class diagram.
>
> <
http://wiki.apache.org/jakarta-lucene/KarlWettin?action=AttachFile&do=view&target=search_uml_1.jpg
>
> <http://shorl.com/hynulymolijo>
>
> Feel free to let me know what I got wrong.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>

--

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244

http://www.cnlp.org
Voice:  315-443-5484
Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message