lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: dismax and long phrases
Date Sun, 05 Oct 2008 18:08:56 GMT
Hmmm, tricky.  I think you've uncovered an algorithmic flaw in DisMax.

Consider 2 fields, f1, f2 and 2 terms foo and bar.  For illustration
purposes, here is a query that's structurally equivalent (assuming
mm=100% of terms must match):

+(f1:foo OR f2:foo) +(f1:bar OR f2:bar)

OK, so it says that "foo" must appear in either field and "bar" must
appear in either field.  So far so good.

Now consider what happens if "bar" is a stopword for f1... the query becomes

+(f1:foo OR f2:foo) +(f2:bar)

Oops, now this query is saying that bar *must* appear in f2... it's
more restrictive than the first query.  It appears that dismax is a
bit broken when some of the fields have stopwords and some don't.
Offhand, I don't see an easy fix for this problem.

-Yonik



On Fri, Oct 3, 2008 at 5:44 PM, Jon Drukman <jdrukman@gmail.com> wrote:
> i have a document with the following field
>
> <name>Saying goodbye to Norman</name>
>
> if i search for "saying goodbye to norman" with the standard query, it works
> fine.  if i specify dismax, however, it does not match.  here's the output
> of debugQuery, which I don't understand at all:
>
> <str name="rawquerystring">saying goodbye to norman</str>
> <str name="querystring">saying goodbye to norman</str>
> <str name="parsedquery">+((DisjunctionMaxQuery((user_name:saying^0.4 |
> description:say | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 |
> location:saying^0.6 | name:say^1.5)~0.01)
> DisjunctionMaxQuery((user_name:goodbye^0.4 | description:goodby |
> tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 |
> location:goodbye^0.6 | name:goodby^1.5)~0.01)
> DisjunctionMaxQuery((user_name:to^0.4 | location:to^0.6)~0.01)
> DisjunctionMaxQuery((user_name:norman^0.4 | description:norman |
> tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 |
> location:norman^0.6 | name:norman^1.5)~0.01))~4)
> DisjunctionMaxQuery((description:"say goodby norman"~100 | group_name:"say
> goodby norman"~100^1.5 | name:"say goodby norman"~100^1.5)~0.01)</str>
> <str name="parsedquery_toString">+(((user_name:saying^0.4 | description:say
> | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | location:saying^0.6 |
> name:say^1.5)~0.01 (user_name:goodbye^0.4 | description:goodby |
> tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 |
> location:goodbye^0.6 | name:goodby^1.5)~0.01 (user_name:to^0.4 |
> location:to^0.6)~0.01 (user_name:norman^0.4 | description:norman |
> tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 |
> location:norman^0.6 | name:norman^1.5)~0.01)~4) (description:"say goodby
> norman"~100 | group_name:"say goodby norman"~100^1.5 | name:"say goodby
> norman"~100^1.5)~0.01</str>
>
>
>
> it works fine if I search for "say goodbye" or "saying goodbye" or "saying
> goodbye norman".  how can i get it to do exact matches (which should score
> very high)?
>
>
> -jsd-
>
>

Mime
View raw message