lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pieter Berkel (JIRA)" <>
Subject [jira] Commented: (SOLR-295) Implementing MoreLikeThis support in DismaxRequestHandler
Date Tue, 10 Jul 2007 07:16:04 GMT


Pieter Berkel commented on SOLR-295:

Thanks Ryan, I missed that original thread mentioned in SOLR-281 but completely agree with
the line of thinking and proposals, (actually I was thinking the same when I made the above
patch).  There is little point in duplicating code across request handlers (leading to code
bloat as you suggested), refactoring common functionality in separate components is going
to ensure consistency in the response format across all handlers.

I'll take a look at the patch submitted on SOLR-281 and see what I can do in terms of implementing
my MLT ideas, however until the 'search component' framework concept has really been 'solidified',
I'm afraid it's going to be difficult to extend.


> Implementing MoreLikeThis support in DismaxRequestHandler
> ---------------------------------------------------------
>                 Key: SOLR-295
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Pieter Berkel
>            Priority: Minor
>         Attachments: MoreLikeThis-DismaxRequestHandler_SOLR-295.patch
> There's nothing too clever about this initial patch to be upload shortly, I have simply
extracted the MLT code from the StandardRequestHandler and inserted it into the DismaxRequestHandler.
 However, there are some broader MLT issues that I'd also like to address in the near future:
> 1) (trivial) No "This response format is experimental" warning when MLT is used with
StandardRequestHandler (or DismaxRequestHandler).  Not really a big deal but at least makes
developers aware of the possibility of future changes.
> 2) (trivial) "org.apache.solr.common.util.MoreLikeThisParams" should perhaps be moved
to the more appropriate package "org.apache.solr.common.params".
> 3) (non-trivial) The ability to specify the list of fields that should be returned when
MLT is invoked from an external handler (i.e. StandardRequestHandler).  Currently the field
list (FL) parameter is inherited from the main query but I can envisage cases where it would
be desirable to specify more or less return fields in the MLT query than the main query. 
One complication is that "mlt.fl" is already used to specify the fields used for similarity.
 Perhaps "mlt.fl" is not the best name for this parameter and should be renamed to avoid potential
conflict / confusion?
> 4) (fairly-trivial) On a similar note to 3, there is currently no way to specify a "start"
value for the rows returned when MLT is invoked from an external handler (e.g. StandardRequestHandler),
it is hard-coded to 0 (i.e. the first "mlt.count" documents matched).  While I can see the
logic in naming the parameter "mlt.count", it does seem a little inconsistent and perhaps
it would be better to rename (or at least alias) it to "mlt.rows" to be consistent with the
CommonQueryParameters.  Note that "mlt.start" is fundamentally different to the "mlt.match.offset"
parameter as the later deals with documents *matching* the initial MLT query while the former
deals with documents *returned* by the MLT query (hope that makes sense).
> I have created a patch that implemented "mlt.start" (to specify the start doc) and added
"mlt.rows" that could be used interchangeably with "mlt.count" (but I would prefer to remove
"mlt.count" altogether), but since it involves changing the method definition of MoreLikeThisHelper.getMoreLikeThese(),
I wanted to get some opinions before submitting it.
> 5) (non-trivial) Interesting Terms - the ability to return interesting term information
using the "mlt.interestingTerms" parameter when MLT is invoked from an external handler. 
This is perhaps the most useful feature I am looking to implement, I can see great benefit
in being able to provide a list of interesting terms or "keywords" for each document returned
in a standard or dismax query.  Currently this only available from the MLT request handler
so perhaps the best approach would be to re-factor the "interestingTerms" code in MoreLikeThisHandler
class and put it somewhere in MoreLikeThisHelper so it is available to all handlers?  Again,
I would appreciate any comments or suggestions.
> I've also noted the MLT features suggested by Tristan [
] which could quite possibly be rolled together with the above points -- I'm not sure whether
is is better to have a single ticket tracking several related issues or create invididual
tickets for each issue, however will be happy to comply with the Solr issue tracking policy
on advice from the core developers.
> regards,
> Pieter

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message