lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: Contribution: better multi-field searching
Date Wed, 13 Oct 2004 20:32:15 GMT
I think that works. When originally looking for solutions, I hadn't
thought of overriding the BooleanQuery.getSimilarity() method
selectively.  You obviously have more familiarity with these classes
:-).

Thanks,

Chuck


> -----Original Message-----
> From: Doug Cutting [mailto:cutting@apache.org]
> Sent: Wednesday, October 13, 2004 12:56 PM
> To: Lucene Developers List
> Subject: Re: Contribution: better multi-field searching
> 
> Chuck Williams wrote:
> > That approach does not work.  I could not find an approach that
would
> > work with the built-in classes, although of course there might be
one.
> > The problem has two components:  coord and the fact that
BooleanQuery's
> > sum their clause scores to compute the final score.  The latter is
not
> > easily overridden.  Specifically,
> >
> >   title:(albino elephant)^4 description:(albino elephant)
> >
> > still has the problem that a result with albino in the title and
albino
> > in the description gets the same score as a result with albino in
the
> > title and elephant in the description
> 
> Perhaps I misunderstood what you desire.  You want a reward for albino
> and elephant both occurring in the document, regardless of field, if
so,
> then what you'd want is:
> 
> (title:albino description:albino) (title:elephant
description:elephant)
> 
> with coord disabled on the *inner* queries, no?  This way coord would
> explicitly boost documents which matched on both terms.
> 
> > FYI, MaxDisjunctionQuery has made an enormous improvement in the
quality
> > of my query results, and I have strong reason to believe the same
would
> > be true in most other domains (more on that coming in the idf^2
> > discussion).  In terms of the albino elephant example, the query
above
> > was putting all the albino animals except elephants above the albino
> > elephants, while the query with an outer BooleanQuery and inner
> > MaxDisjunctionQuery's
> >
> >     ( (title:albino^4 | description:albino)~0.1
> >       (title:elephant^4 | description:elephant)~0.1
> >     )
> >
> > properly puts the albino elephants on top.
> 
> If "albino" is outscoring "elephant" then you could either reduce the
> impact of idf or increase the impact of coordination.  Did you try,
> e.g., defining coord as (overlap/max)^2 or somesuch?
> 
> Or, perhaps take proximity into account, with "albino elephant"~10?
Or
> simply using AND instead of OR?  These days most web search engines
use
> AND as the default operator and reward for proximity.  Is that wrong
for
> your application?  AND is effectively a coord of
(overlap/max)^infinity.
> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message