lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Search Hit frequency and location
Date Thu, 16 Jun 2005 20:26:23 GMT

On Jun 16, 2005, at 3:03 PM, Sean O'Connor wrote:
> The big stumbling block I have at the moment is understanding  
> whether Terms can be used to find something like a phrase query,  
> proximity query, or boolean query. I think the answer is no, two  
> different concepts.

Terms are the heart of searching.  All queries boil down to finding  
specific terms.  A PhraseQuery finds terms exactly side-by-side  
position-wise (or near by some slop factor).  BooleanQuery is an  
aggregation of other primitive queries, so its sort of special in  
that it doesn't deal with terms directly, but all primitive  
aggregated queries do.  A PrefixQuery is not a "primitive" query - it  
rewrites itself to a BooleanQuery OR'd with all the matching terms.   
So the terms are the "atoms" of Lucene.  Hopefully this is in-line  
with the conceptual stumbling block.

> But I also tend to think that the wheel has already been invented  
> to find how many times a phrase (i.e. "Lucene in Action") appears  
> in a document. Before I go digging through the source code, and  
> possibly creating some rather embarrassing hack(s), I thought I  
> would check to see if there is a 'right' way to go about this.

Deep within PhraseQuery the number of occurrences is tracked and used  
in scoring.  But at the high level of searching, that detail is not  
exposed.  Using IndexSearcher.explain to see the tf() of a  
PhraseQuery may help, but by default it'll be the square root of the  
number of occurrences.  (at least I hope I'm on track with that  
explanation - please correct me if I've misspoken)  I'd love to hear  
how others have done this sort of thing - find the number of  
occurrences of a phrase.

> Alternatively, any suggestions on what to google, or where to look  
> to educate myself would be welcome as well.

You're in the right place.  Keep asking - we're all here to learn and  
share what we know.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message