lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew May (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-37) Add additional configuration options for Highlighting
Date Wed, 30 Aug 2006 14:40:22 GMT
    [ http://issues.apache.org/jira/browse/SOLR-37?page=comments#action_12431590 ] 
            
Andrew May commented on SOLR-37:
--------------------------------

I've spent a bit of time trying to understand Gradient formatting and how QueryScorer is used.
As I didn't see any very good documentation for this (I may have missed it) - I thought I'd
share.

It appears that GradientFormatter colours according to the term's weight within the index
- so terms that appear less frequently in the index will be coloured closer to the max foreground/background
colour. So, the colour is not related to the specific document or fragment being evaluated
and that term will be highlighted the same for the entire results set. If two terms appear
with a similar frequency in the index they will have similar colours - and this seems to happen
a lot (perhaps because scaling is done between 0 and maxWeight rather than minWeight and maxWeight).

There's also a fairly serious bug in the colouring that makes a lot of combinations give meaningless
results (e.g. minBg=#FF0000, maxBg=#00FF00 will give results coloured #FFFF00) - see GradientFormatter.getColorVal().

In other words, I now agree with Mike that we should not support Gradient formatting. Perhaps
we still want to retain the hl.formatter= parameter in case we have any other values than
"simple" in the future - and keep hl.simple.pre and hl.simple.post as they are.

As for the QueryScorer, I think it makes sense to support all three ways it can be construted:
1) hl.scoring=simple (the default)  - construct with Query only. May have some matches from
other terms, but allows you to highlight different fields to the ones searched.
2) hl.scoring=field - constructed with Query and fieldName. Only highlights terms matched
in this field by the query.
3) hl.scoring=fieldidx - constructed with Query, fieldName and IndexReader. I think the selection
of the best fragment(s) will be improved because the terms will be weighted according to their
frequency in the index - but this has to be more costly as it calls IndexReader.docFreq for
each term.

Does that sound reasonable?

> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch, patch, patch.diff
>
>
> As discussed in the mailing list, I've been looking at adding additional configuration
options for highlighting. 
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this value)
> I've added variables for these values to CommonParams, plus there's a fields Map<String,CommonParams>
that is parsed from nested NamedLists (i.e. a lst named "fields", with a nested lst for each
field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but the hightlighting
is still done in SolrPluginUtils and the doStandardHighlighting() method still has the same
signature, but the other highlighting methods have had to be changed (because highlighters
are now created per highlighted field).
> I'm not particularly happy with the code to pull parameters from CommonParams, first
checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && params.fields.get(fieldName).formatterPre
!= null) ?
>                params.fields.get(fieldName).formatterPre : 
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between simple/gradient.
Probably that's a bad decision, but I wanted an easy way to choose between the standard formatters
without having to invent a generic way of supplying arguments for the constructor. Perhaps
there should be formatterType=simple/gradient and formatterClass=... which overrides formatterType
if set at a lower level - with the formatterClass having to have a zero-args constructor?
Note: gradient is actually SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying fragsize to GapFragmenter
where >0 (instead of what was a default of 50) may not make sense.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message