lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Issue with sentence specific search
Date Fri, 08 Oct 2010 02:27:01 GMT
Hi Sirish,

StandardTokenizer does not produce a token from '#', as you suspected.  Something that fits
the "word" definition, but which won't ever be encountered in your documents, is what you
should use for the delimiter - something like a1b2c3c2b1a .

Sentence boundary handling is clunky in Lucene right now - there has been some discussion
of how to directly support this kind of thing, but no code at this point.

Steve

> -----Original Message-----
> From: Sirish Vadala [mailto:sirishreddy@gmail.com]
> Sent: Thursday, October 07, 2010 7:13 PM
> To: java-user@lucene.apache.org
> Subject: RE: Issue with sentence specific search
> 
> 
> Hi Steven,
> 
> I have implemented sentence specific proximity search as suggested below.
> However, unfortunately it still doesn't identify the sentence boundaries
> for
> my search.
> 
> I am using # as a delimiter between my sentences while indexing the
> content:
> 
> ------------
> ArrayList<String> sentencesList = sentenceScanner.getAllSentences();
> StringBuffer textWithToken = new StringBuffer();
> for (String sentence : sentencesList){
> 	textWithToken.append(sentence + " # ");
> }
> addFieldToDocument(document, IFIELD_TEXT, textWithToken.toString(), true,
> true);
> ------------
> * Used StandardAnalyzer to initialize the indexWriter while adding the
> document
> 
> This is how I am performing my search:
> 
> ------------
> Query query = null;
> strQuery = strQuery.replaceAll("\\s+", " ");
> String[] spanTerms = strQuery.split(" ");
> SpanQuery[] spanQueries = new SpanQuery[spanTerms.length];
> for (int count = 0; count < spanTerms.length; count++) {
> 	String spanTerm = spanTerms[count];
> 	spanQueries[count] = new SpanTermQuery(new Term(field, spanTerm));
> }
> if(!withinSentence){
> 	SpanQuery spanQuery = new SpanNearQuery(spanQueries, span, true);
> 	query = spanQuery;
> } else if (withinSentence){
> 	SpanQuery queryInclude = new SpanNearQuery(spanQueries, span, true);
> 	SpanQuery queryExclude = new SpanTermQuery(new Term(field, "#"));
> 	SpanQuery spanNotQuery = new SpanNotQuery(queryInclude,
> queryExclude);
> 	query = spanNotQuery;
> }
> bQuery.add(query, BooleanClause.Occur.MUST);
> 
> ------------
> 
> When I eventually read my query on the console, this is how it looks in
> both
> cases:
> 
> With no sentence boundary
> +(author:amanda) +spanNear([text:efficiency, text:delta], 10, true)
> +(year:2009 year:2010)
> 
> With sentence boundary
> +(author:amanda) +spanNot(spanNear([text:efficiency, text:delta], 10,
> true),
> text:#) +(year:2009 year:2010)
> 
> My guess is that probably, my index isn't saving the sentence boundary
> value
> # as a separate term. Any hints or pointers on where exactly I am
> mis-implementing would be highly appreciated.
> 
> Thanks.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Issue-
> with-sentence-specific-search-tp1644352p1651512.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message