lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Issue with sentence specific search
Date Wed, 06 Oct 2010 20:02:30 GMT
Hi Sirish,

Have you looked at SpanQuery's yet?:

http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/spans/package-summary.html

See also this Lucid Imagination blog post by Mark Miller:

http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/

One common technique, instead of using a larger-than-normal position increment gap between
sentences, is using a sentence boundary token like '$' or something else that won't ever itself
be the target of search.  Quoting from a post Mark Miller made to the lucene-user list last
year <http://www.lucidimagination.com/search/document/c9641cbb1a3bf928/multiline_regex_with_lucene>):

	First you inject special marker tokens as your paragraph/
	sentence markers, then you use a SpanNotQuery that looks
	for a SpanNearQuery that doesn't intersect with a
	SpanTermQuery containing the special marker term.

Mark's suggestion would work for your within-sentence case, and for the case where you don't
care about sentence boundaries, you can use SpanNearQuery without the SpanNotQuery.

Using this technique, a single field should serve all of your needs.

Steve

> -----Original Message-----
> From: Sirish Vadala [mailto:sirishreddy@gmail.com]
> Sent: Wednesday, October 06, 2010 3:19 PM
> To: java-user@lucene.apache.org
> Subject: RE: Issue with sentence specific search
> 
> 
> Hmmm... My mistake.
> 
> In fact it is not a phrase search, but its a proximity search.
> 
> My screen gives four options to the user: -All words, -Exact phrase, -At
> least one word, -Within proximity of xx words.
> 
> In case of -All words and -At least one word, this is irrelevant an
> everything works fine.
> 
> In case of -Exact phrase, I do need to make it sentence specific that
> works
> well with my current implementation.
> 
> In case of -Within proximity of xx words, the user wants to have an
> option,
> to either check within xx words in the same sentence or without any
> sentence
> boundaries.
> 
> I am using the following code to perform proximity search:
> 
> -----
> QueryParser qParser = new QueryParser(Version.LUCENE_29, field,
> this.analyzer);
> qParser.setDefaultOperator(QueryParser.OR_OPERATOR);
> query = qParser.parse(strQuery);
> 
> //strQuery format --> "Search Text"~SPAN
> 
> -----
> bQuery.add(query, BooleanClause.Occur.MUST);
> -----
> 
> this.analyzer is my custom analyzer. This is to implement, as I already
> said, right now I am adding each sentence as a separate field(with the
> same
> field name) to the same document. Also I am setting the  position
> increment
> gap that I did by sub-classing Analyzer and overriding
> Analyzer#getPositionIncrementGap() to return 10.
> 
> Since for each sentence, the position increment gap is modified, I am not
> sure if I can perform a sentence independent proximity search.
> 
> Apologize for not putting it well before and appreciate any responses.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Issue-
> with-sentence-specific-search-tp1644352p1644598.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message