lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elmer <evanchaste...@gmail.com>
Subject Re: MultiFieldQueryParser with default AND and stopfilter
Date Thu, 09 Jun 2011 12:58:42 GMT
Thank you Trejkaz!
Inspired by your solution I've created the attached extension to the
MFQP, a little different than you proposed. In getFieldQuery, if a
(stop)word is removed by an analyzer for some field, it will return
null, so that term is then ignored (only if using AND as default
operator). Afterwards, the parse method will redo the parsing, now using
the MFQP implementation, and combines both queries by taking the union.
The query 'the best project' now gets parsed as:

(+(title:best description:best authors.name:best) +(title:project
description:project authors.name:project)) (+(authors.name:the)
+(title:best description:best authors.name:best) +(title:project
description:project authors.name:project))

where the fields title and description use a stopfilter. The advantage
of this implementation is that queries only containing stopwords (like
"to be or not to be") are still matched on the non-stopword fields.
Moreover, scoring will probably better match the relevance.

BR,
Elmer



On Thu, 2011-06-09 at 07:32 +1000, Trejkaz wrote:
> On Wed, Jun 8, 2011 at 6:52 PM, Elmer <evanchastelet@gmail.com> wrote:
> > the parsed query becomes:
> >
> > '+(title:the) +(title:project desc:project)'.
> >
> > So, the problem is that docs that have the term 'the' only appearing in
> > their desc field are excluded from the results.
> 
> Subclass MFQP and override getFieldQuery.
> 
> If the field is null then MFQP will hand you back a BooleanQuery - if
> the number of terms in this is lower than the number of fields then
> some of them must have been removed because they were stop words.  If
> this occurs, replace the whole BooleanQuery with a MatchAllDocsQuery.
> 
> Then you will effectively get:
> 
>     +(*:*) +(title:project desc:project)
> 
> And then in getBooleanQuery you could optimise the query to take out
> MatchAllDocsQuery if it isn't necessary in a boolean query.
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


Mime
View raw message