lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Search Hit frequency and location
Date Thu, 16 Jun 2005 20:39:53 GMT
On Thursday 16 June 2005 21:03, Sean O'Connor wrote:
> Thanks for the clarification. I had assumed that to be the case, but 
> assumptions have a tendency to come back and bite me in inappropriate 
> places. By pointing that out, you've probably saved me from beating my 
> head against the wall in the near future : -).
> 
> The big stumbling block I have at the moment is understanding whether 
> Terms can be used to find something like a phrase query, proximity 
> query, or boolean query. I think the answer is no, two different 

Terms are normally used as building blocks all of these.

> concepts. But I also tend to think that the wheel has already been 
> invented to find how many times a phrase (i.e. "Lucene in Action") 
> appears in a document. Before I go digging through the source code, and 
> possibly creating some rather embarrassing hack(s), I thought I would 
> check to see if there is a 'right' way to go about this.

Counting how often a query occurs is actually a simplification
of the normal query search. A PhraseQuery for the phrase
"lucene in action" provides an ExactPhraseScorer to score
each document. Have a look at the code of that scorer, its superclass
PhraseScorer only needs a minor modification to provide the phrase
frequency in a document as the score() for that document.
Getting from a Query to a Scorer is somewhat involved, but
it is documented in the javadocs of Weight.
An easy way to implement this is to copy the PhraseQuery
to one of your own so it uses your modified versions of (Exact)PhraseScorer.
Once this query works, override the corresponding get...() method in
your subclass of QueryParser to return that query instead of PhraseQuery.
For testing, have a look at TestPhraseQuery.java in the src/test directory.
 
> Alternatively, any suggestions on what to google, or where to look to 
> educate myself would be welcome as well.

TermQuery and TermScorer make a good starting point. To save
some reading, ignore the explain() methods initially.

> Cheers,
> Sean

Proost,
Paul Elschot.
 
> 
> 
> Erik Hatcher wrote:
> 
> >
> > On Jun 16, 2005, at 12:03 PM, Sean O'Connor wrote:
> >
> >> Yes, see the Javadoc for IndexReader.termPositions().
> >>     I'm probably missing the obvious here, but I assume this refers to
> >> the analyzed terms (i.e. individual words, possibly transmogrified by
> >> the analyzer).
> >
> >
> > Just to respond to part of your mail:
> >
> > Terms do not necessarily come from analysis... they could be  
> > specified directly using Field.Keyword() for example.  Any _indexed_  
> > field has term(s), with the possibility that the indexed field is  
> > analyzed or not.
> >
> >     Erik
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message