lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
Date Tue, 28 Oct 2014 12:54:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186775#comment-14186775
] 

Noble Paul commented on SOLR-6248:
----------------------------------

doesn't it make sense to put an example query in the description ?

> MoreLikeThis Query Parser
> -------------------------
>
>                 Key: SOLR-6248
>                 URL: https://issues.apache.org/jira/browse/SOLR-6248
>             Project: Solr
>          Issue Type: New Feature
>          Components: query parsers
>            Reporter: Anshum Gupta
>            Assignee: Anshum Gupta
>         Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch,
SOLR-6248.patch
>
>
> MLT Component doesn't let people highlight/paginate and the handler comes with an cost
of maintaining another piece in the config. Also, any changes to the default (number of results
to be fetched etc.) /select handler need to be copied/synced with this handler too.
> Having an MLT QParser would let users get back docs based on a query for them to paginate,
highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc.
> A bit of history about MLT (thanks to Hoss)
> MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query
as input, find docs that match that 
> query, club them together to find interesting terms, and then use those 
> terms as if they were my main query to generate a main result set.
> This result would then be used as the set to facet, highlight etc.
> The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList\(y)
> The MLT component on the other hand solved a very different purpose of augmenting the
main result set. It is used to get similar docs for each of the doc in the main result set.
> DocSet\(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)
> The new approach:
> All of this can be done better and cleaner (and makes more sense too) using an MLT QParser.
> An important thing to handle here is the case where the user doesn't have TermVectors,
in which case, it does what happens right now i.e. parsing stored fields.
> Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would
need to be a TextField with an index analyzer defined. This analyzer will then be used to
extract terms for MLT.
> In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema
(if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field
and parse it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message