lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean O'Connor <>
Subject Location of code which determines a Hit for PhraseQuery
Date Thu, 08 Sep 2005 03:16:48 GMT
    I am trying to work through the Hit collection process for a 
PhraseQuery (using an exact phrase). For an example search, say I'm 
looking for:
"lucene action"  (quotes indicating exact phrase)

in a one doc, one field index consisting of:
wow, lucene rocks, lucene action items are cool, very action packed

So my questions are:
- What does (Default)Similarity.idf() do?
    Just create a base frequency for a term in the query?
    i.e. it creates something of a raw score, based on term (not phrase) 
frequency in a doc

- Does (Phrase)Weight do the work of finding a phrase match?
    I don't think so, but get confused in the weight/scorer functionality

-Does ExactPhraseScorer do the work of finding a phrase match?
    I think so, but seem to keep missing where
    i.e. where does the first instance of "lucene" (lucene rocks) get 
skipped but the second one (lucene action) become a hit?

- What does SegmentTermPositions do?
    I think this is critical to the PhrasePositions process, which 
PhraseScorer uses
    I think it hold pointers to the positions in the index file streams 
(ability to read the indexes?)

- Where does the code identify a 'proper' hit for something like an 
exact phrase?

    Apologies in advance for the poor sample text above, and the 
repetition in question matter. Hopefully I am getting closer to getting 
my head wrapped around the query/hit process (and then work on extending 
the hits to include offset position).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message