lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-24) Add Highlighting to standard request handler
Date Mon, 10 Jul 2006 07:50:30 GMT
    [ http://issues.apache.org/jira/browse/SOLR-24?page=comments#action_12419995 ] 

Mike Klaas commented on SOLR-24:
--------------------------------

> This was done because there is an optimization that can use a filter (the set of all
documents matching a query) to satisfy > a sorted query if scores aren't needed. This bypasses
re-executing the query.

I'll ensure this skippage is possible with StandardrequestHandler.  It should probably be
added to DisMax too, in that case.

> Regarding gaps... I can see how one would need to rely on a position gap when using term-vectors...
but when > > re-analyzing stored fields, they are already discrete. Is the problem caused
by the hilighter architecture (I haven't used it >before)?

Highlighter takes a tokenstream and a piece of text and fragments it.  The fragments are scored
to determined which top set of them to return.  The only way for it to work on multiple token
streams is to invoke it multiple times (in which case we have to find a way of merging the
highlighting output from each in a nice way), or fooling highlighter into thinking it is a
signle stream (ensuring separation among the various parts), which is attractive since Highlighter
compares the fragments from all the parts and picks the globally highest scoring fragments.



> Add Highlighting to standard request handler
> --------------------------------------------
>
>          Key: SOLR-24
>          URL: http://issues.apache.org/jira/browse/SOLR-24
>      Project: Solr
>         Type: New Feature

>   Components: search
>     Reporter: Mike Klaas
>  Attachments: highlight_patch_v1.diff, highlight_patch_v2.diff, highlight_patch_v3.diff,
highlight_patch_v4.diff
>
> This patch adds highlighting functionality to solr request handlers it also refactors
StandardRequestHandler to use the common functionality provided in SolrPluginUtils.  I'd have
preferred to do two separate patches, but creating two mutually-dependent patches on a repo
without being able to commit a revision was daunting.
> -----------------------------------
> Refactoring StandardRequestHandler:
> 1. Moved solr.util.CommonParams to its own class.  Removed DisMax-specific parameters,
and placed in a subclass.
> 2. StandardRequestHandler uses CommonParams to store config-time parameter values (new
feature)
> 3. StandardRequestHandler uses SolrPluginUtils methods for duplicate functionality
> 4. Some of said SPU methods have grown a "params" parameter to enable them to use default
values.  (Note: instead of passing this around, something like a RequestHelper class which
carries the SolrRequest and Param values would be useful.  This class could house the utility
methods that require Request parameters).
> 5. SolrPluginUtils.getParam() only uses the default parameter if it is null, not blank.
> --------------------------------------
> Highlighting:
> 1. Highlighting is controlled by three request parameters:
>    highlight: list of fields to highlight, or highlight the default field if at all present
>   maxSnippets: maximum number of snippets to return for each field
>    highlightFormatterClass: 'solr.<classname>' or full package path of highlight.Formatter
subclass to use in highlighting.
> 2. Default formatter is to use <em> tags.  There are issues with this approach,
but are mitigated with the ability to specify a custom Formatter.  Definately should consider
alternatives (a custom xml approach to denote highlit regions will require some Highlighter
package hackery).
> 3. Document summaries are returned as a separate element under <response> format
is still up for discussion.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message