lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject SpanScorer problem?
Date Fri, 17 Jul 2009 10:44:02 GMT
Hello,

This problem was reported by my customer. They are using Solr 1.3
and uni-gram, but it can be reproduced with Lucene 2.9 and
WhitespaceAnalyzer.
The program for reproducing is at the end of this mail.

Query:
(f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")

The snippet we expected is:
x y z <B>a</B> <B>b</B> <B>c</B> <B>d</B>
e f g <B>b</B> <B>c</B> <B>g</B>

but we got:
x y z <B>a</B> b c <B>d</B> e f g <B>b</B> <B>c</B>
<B>g</B>

Thank you,

Koji
---------

public class TestHighlighter {

static final String CONTENT = "x y z a b c d e f g b c g";
static final String PH1 = "\"a b c d\"";
static final String PH2 = "\"b c g\"";
static final String F1 = "f1";
static final String F2 = "f2";
static final String F1C = F1 + ":";
static final String F2C = F2 + ":";
static final String QUERY_STRING =
"(" + F1C + PH1 + " OR " + F2C + PH1 + ") AND ("
+ F1C + PH2 + " OR " + F2C + PH2 + ")";
static Analyzer analyzer = new WhitespaceAnalyzer();

public static void main(String[] args) throws Exception {
QueryParser qp = new QueryParser( F1, analyzer );
Query query = qp.parse( QUERY_STRING );
CachingTokenFilter stream = new CachingTokenFilter(
analyzer.tokenStream( F1, new StringReader( CONTENT ) ) );
Scorer scorer = new SpanScorer( query, F1, stream, false );
Highlighter h = new Highlighter( scorer );
System.out.println( "query : " + QUERY_STRING );
System.out.println( h.getBestFragment( analyzer, F1, CONTENT ) );
}
}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message