lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sirish Vadala <>
Subject RE: Issue with sentence specific search
Date Thu, 07 Oct 2010 23:13:15 GMT

Hi Steven,

I have implemented sentence specific proximity search as suggested below.
However, unfortunately it still doesn't identify the sentence boundaries for
my search.

I am using # as a delimiter between my sentences while indexing the content:

ArrayList<String> sentencesList = sentenceScanner.getAllSentences();
StringBuffer textWithToken = new StringBuffer();
for (String sentence : sentencesList){
	textWithToken.append(sentence + " # ");
addFieldToDocument(document, IFIELD_TEXT, textWithToken.toString(), true,
* Used StandardAnalyzer to initialize the indexWriter while adding the

This is how I am performing my search:

Query query = null;
strQuery = strQuery.replaceAll("\\s+", " ");
String[] spanTerms = strQuery.split(" ");
SpanQuery[] spanQueries = new SpanQuery[spanTerms.length];
for (int count = 0; count < spanTerms.length; count++) {
	String spanTerm = spanTerms[count];
	spanQueries[count] = new SpanTermQuery(new Term(field, spanTerm));
	SpanQuery spanQuery = new SpanNearQuery(spanQueries, span, true);
	query = spanQuery;
} else if (withinSentence){
	SpanQuery queryInclude = new SpanNearQuery(spanQueries, span, true);
	SpanQuery queryExclude = new SpanTermQuery(new Term(field, "#"));
	SpanQuery spanNotQuery = new SpanNotQuery(queryInclude, queryExclude);
	query = spanNotQuery;
bQuery.add(query, BooleanClause.Occur.MUST);


When I eventually read my query on the console, this is how it looks in both

With no sentence boundary
+(author:amanda) +spanNear([text:efficiency, text:delta], 10, true)
+(year:2009 year:2010)

With sentence boundary
+(author:amanda) +spanNot(spanNear([text:efficiency, text:delta], 10, true),
text:#) +(year:2009 year:2010)

My guess is that probably, my index isn't saving the sentence boundary value
# as a separate term. Any hints or pointers on where exactly I am
mis-implementing would be highly appreciated.

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message