lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Byrne <>
Subject Queries spanning paragraphs
Date Mon, 22 Oct 2007 11:31:29 GMT
Hi all,

I need the ability to match documents that have two terms that occur 
within n paragraphs of each other. I had a look through the archives, 
and although many people have explained ways to implement per-sentence 
or per-paragraph indexing & searching, no seems to have tackeled this 
one yet.

The only idea I can up up with is this:

I will index the entire document, as normal, but also index the 
paragraphs seperately, numbering them accoring to the order they occur 
in. (Storage space isn't an issue). When searching, I will first find 
all documents that have both terms, using the full-content field.

Then I can get all the paragraphs that are part of that doc, and have 
either of the search terms. I would still have to implement a bit of 
logic to check which paragraphs have which term, and check the distance 
between them (from the order info I kept when indexing).

I'm sure this would work, but it would be very slow. I can't help 
feeling there's a better solution, that might involve inserting 
paragraph tags into the content in a special field in my index, and 
somehow using SpanQueries to find matches that have a given number of 
paragraph marks in between... but I don't know if that's possible.

Does anyone have any ideas?

John B.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message