lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Proximity Search for SENTENCE and PARAGRAPH
Date Mon, 07 Apr 2014 12:03:57 GMT
One simple hack which may or may not meet your objectives:

1) index each paragraph as if it were a document (this would then not allow Boolean across
paragraphs, which could be a problem)

2) set the position increment gap to, say, 100 and then index each sentence within the paragraph
as another value in a multivalued field.  This would then prevent phrasal matches across sentence
boundaries if the user is searching for proximity < 100.

Another hack along the lines you mention would be to add in an impossible token "SENTENCE"
or "PARAGRAPH" and then wrap the user's query as a SpanNotQuery.  LUCENE-5205's SpanOnlyParser
might be of use for this.

You may also want to look into the PostingsHighlighter's use of BreakIterator for ideas...It
isn't immediately clear to me how that could be used for retrieval, but it does work for highlighting.

-----Original Message-----
From: Jigar Shah [mailto:jigaronline@gmail.com] 
Sent: Monday, April 07, 2014 3:47 AM
To: java-user@lucene.apache.org
Subject: Proximity Search for SENTENCE and PARAGRAPH

Hello all,

I need to implement 2 features in my application:

1. "Proximity for words and phrases within the same sentence"

2. "Proximity for words and phrases within the same paragraph"

Doing some research on internet if found following things.

There is "ProximityQueryNode" which has some enum for this, but there seems
no support in parser for it.

As there are no out-of-the box support or some contrib, for such feature,
except one
https://github.com/markrmiller/qsol. which is not maintained.

There are some workarounds suggested like marking sentence/paragraph
boundaries. And then search using SpanQuery Api.

Please let me know if some work done for such features, or some proven
approach.

Thanks
Jigar Shah.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message