lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <>
Subject trying to use the highlighter
Date Fri, 03 Sep 2010 12:05:53 GMT

Hello list,

I'm strugging again with the highlighter. I don't understand why I obtain sporadically InvalidTokenOffsetsException.

The mission: given a query, detect which field was matched, among the names of the concepts:
there can be several names for a given concept, also in one language. Concepts are documents
and names are in fields name-xx where xx is the two-letter-language.

Here's the method I'm using:

    public String computeMatchedField(int docNum, Document doc, Analyzer analyzer, Query query)
throws IOException {
        //System.out.println("----- computing matched field for query " + query + " on document
" + doc.get("uri"));
        query = query.rewrite(this.reader);
        String found = null;
        float maxScore = 0;
        try {
            for(Field f: (List<Field>) doc.getFields()) {
                QueryScorer scorer = new QueryScorer(query,reader,;
                if(!"name-")) continue;
                //System.out.println("Measuring field " + + ": " + f.stringValue());
                String text = f.stringValue();
                TokenStream tokenStream = TokenSources.getAnyTokenStream(reader,docNum,,
doc, analyzer);
                SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
                Highlighter highlighter = new Highlighter(htmlFormatter, scorer);
                TextFragment[] frags = highlighter.getBestTextFragments(tokenStream, text,
false, 1);
                if(frags==null || frags.length==0) continue;
                float score = frags[0].getScore();
                //System.out.println("Score: " + score);
                if(score > maxScore) {
                    maxScore = score;
                    found = frags[0].toString();
        } catch(Exception ex) {ex.printStackTrace();}
        return found;

Unfortunately, I have to catch InvalidTokenOffsetsException which does happen sometimes, not
When it occurs, it stops the highlighting (the detected field is "null") and also costs quite
some time.

What am I doing wrong?
I tried making my own tokenStream with no difference.

thanks in advance


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message