lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: search that span over consecutive documents
Date Sat, 09 Jul 2005 01:22:33 GMT

On Jul 8, 2005, at 2:57 AM, Daniel Moldovan wrote:
> My application must index a lot of books that are stored in xml files.
>
> Each xml file represents a page of the book and this way each page  
> becomes a
> lucene Document.
>
> Each page is organized in different sections and finally each section
> contains lines.
>
>
>
> What I need to do is give the user the possibility to search for a  
> phrase
> that starts at the
> and of a page and continues on the next page. The span should have  
> some
> limits, let's say,  6 words on each page.
>
> Does any one experienced this kind of search? Please share you  
> knowledge if
> you did.

You're lucky you get to represent your data so hierarchically!  Try  
getting scholars to represent a book in such a fashion!!!  (I'm  
dealing with scholarly works in XML format and sections do not fall  
_within_ pages, they can span across pages).

In this case, one field of your document should probably index a page  
+ 6 words on either side of it from the previous and next pages.   
Maybe you also have a field that represents only the page as well.   
Perhaps something at query time decides which field to search?  Maybe  
all phrase queries use the overlapped field and other query types use  
the single page field?

     Erik




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message