lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: trying to break up highlighted text on line boundaries
Date Tue, 14 Aug 2007 17:31:53 GMT

On 13-Aug-07, at 6:18 PM, Benjamin Higgins wrote:

> (using last night's Solr build)
>
>
>
> Can't seem to get this to work.  I am trying to use the regex
> highlighter fragment type.  The docs suggest looking at the example
> solrconifg.xml for a demonstration of a fragmentor that splits on
> sentences.  It looks like this:
>
>
>
> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
>
>
>
> This confuses me somewhat.  I would have expected perhaps something  
> that
> splits on sentence punctuation like [.!?], but this seems to be the
> reverse (perhaps so that the punctuation is included?).  Still, why
> isn't it [^.!?]?  I read the regex as match between 20 and 200
> characters that are one of dash, alphanumeric, space, comma, newline,
> double or single quote.

The pattern is supposed to look like what you _want_ a fragment to  
look like.  The reason why this is so is that the desired fragments  
are often not all that is present (wheat from chaff), and because you  
don't necessarily want to start a segment where the last one ended.

> Anyway I have tried many many patterns, and I can't often tell how  
> they
> are working.  I certainly haven't been able to split on line  
> boundaries.

What are you fragsize/slop settings relative to the size of the lines  
you want to match?

Try something like:

hl.regex.pattern: [^\n]+
hl.regex.slop: 1.0
hl.fragsize: <maximum line length

Let me know how that goes,
-Mike

Mime
View raw message