lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Goldowsky <bo...@alum.mit.edu>
Subject RE: Indexing multiple instances of the same field for each document
Date Fri, 27 Feb 2004 22:06:56 GMT
On Fri, 2004-02-27 at 12:12, Roy Klein wrote:
> Doug,
> 
> The query results are different, I'm attaching my test code.
> 
> >> Also FYI, I found that phrase queries don't work against a field 
> >> that's been added multiple times. If I query the phrase "brown fox", 
> >> against the two example docs above, only the second matches.

> My results:
> Query: contents:quick contents:brown 
> Hits: 2 
> 1  
> 2  
> PQuery: 
> contents:"quick brown" 
> Phrase Hits: 1 
> 2 

By my intuition, what Lucene is doing here is the correct behavior.  Say
for a moment your field was author, and you index multiple authors by
having multiple occurrences of a Field named author.  So you might have:
author: John Smith
author: Mary Jones

now the query 
  author:smith author:jones 
should return this document, but the query 
  author:"john jones"
should not.  It would be unfortunate if this were "fixed" to return the
document with the phrasal query, since the two words in different fields
do not in fact occur as a phrase.

If you have "the quick brown fox..." as in your example, your indexing
code should combine them all into a single field before adding them to
the Document.

Just my humble opinion, of course...

Boris

-- 
Boris Goldowsky
boris@alum.mit.edu
www.goldowsky.com/consulting


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message