lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Document proximity
Date Wed, 30 Mar 2005 19:10:39 GMT
DM Smith wrote:
> We already have a solution, and it is external to Lucene. We look for
> hits on things that are to be adjacent, get their "canonical"
> reference and then compare the distances between these. While this
> works well, I was hoping for a solution within Lucene.
> 
> This does not give us the ability to look for phrases across verse boundaries.

Yes and no. Let's look at this document structure:

field1: current
field2: prev+current+next

Then, you expand each query into "field1:query OR field2:query". For 
terms and phrases that fit only in the current verse the score will be 
higher than for terms and phrases that span the verse boundary, because 
the former will get additional boost from matching the field1.

> As to storing book or chapter in the index, we don't do that, just the
> whole reference.
> This is worth looking into as it would help in doing range restricted
> searches. Today, we do the restriction after the search.

This would need some testing, but I would suggest splitting this into 
two fields: one would be the book name, the other would be a combined 
chapter/verse, as an integer.

-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message