lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sirish Vadala <sirishre...@gmail.com>
Subject RE: Issue with sentence specific search
Date Thu, 07 Oct 2010 23:13:15 GMT

Hi Steven,

I have implemented sentence specific proximity search as suggested below.
However, unfortunately it still doesn't identify the sentence boundaries for
my search.

I am using # as a delimiter between my sentences while indexing the content:

------------
ArrayList<String> sentencesList = sentenceScanner.getAllSentences();
StringBuffer textWithToken = new StringBuffer();
for (String sentence : sentencesList){
	textWithToken.append(sentence + " # ");
}			
addFieldToDocument(document, IFIELD_TEXT, textWithToken.toString(), true,
true);
------------
* Used StandardAnalyzer to initialize the indexWriter while adding the
document

This is how I am performing my search:

------------
Query query = null;
strQuery = strQuery.replaceAll("\\s+", " ");
String[] spanTerms = strQuery.split(" ");
SpanQuery[] spanQueries = new SpanQuery[spanTerms.length];
for (int count = 0; count < spanTerms.length; count++) {
	String spanTerm = spanTerms[count];
	spanQueries[count] = new SpanTermQuery(new Term(field, spanTerm));
}
if(!withinSentence){
	SpanQuery spanQuery = new SpanNearQuery(spanQueries, span, true);
	query = spanQuery;
} else if (withinSentence){
	SpanQuery queryInclude = new SpanNearQuery(spanQueries, span, true);
	SpanQuery queryExclude = new SpanTermQuery(new Term(field, "#"));
	SpanQuery spanNotQuery = new SpanNotQuery(queryInclude, queryExclude);
	query = spanNotQuery;
}
bQuery.add(query, BooleanClause.Occur.MUST);

------------

When I eventually read my query on the console, this is how it looks in both
cases:

With no sentence boundary
+(author:amanda) +spanNear([text:efficiency, text:delta], 10, true)
+(year:2009 year:2010)

With sentence boundary
+(author:amanda) +spanNot(spanNear([text:efficiency, text:delta], 10, true),
text:#) +(year:2009 year:2010)

My guess is that probably, my index isn't saving the sentence boundary value
# as a separate term. Any hints or pointers on where exactly I am
mis-implementing would be highly appreciated.

Thanks.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-tp1644352p1651512.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message