lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Lucene Proximity Searches
Date Fri, 18 Apr 2008 15:23:39 GMT
Ana,

Op Friday 18 April 2008 12:41:38 schreef Ana Rabade:
> I am using ngrams and I need to force that a group of them are
> together, but if any of them fails, I need that the document is also
> scored. Perhaps you could help me to find the solution or give me a
> reference of which changes I must do. I am using SpanNearQuery,
> because the ngrams must be in order. Thanks for your answer.
>    - Ana Maria Freire Veiga -

Assuming that K terms are involved and K-1 of them need to match
in order as ngrams, there are the following options:

- create K SpanNearQuery's on K-1 ordered terms with appropriate
slop, add these to a BooleanQuery using Occur.SHOULD, and
search this BooleanQuery.

- starting from the same K SpanNearQuery's on K-1 terms,
search each of these separately and use your own HitCollector to
combine the scores.

For these two options, one could also use the K terms SpanNearQuery
to influence the scoring somewhat. The problem with these
options is that the number of terms in the query is quadratic
in K, possibly giving performance problems for higher values of K.
In that case, try the third option:

- modify the code of the NearSpansOrdered class in 
the org.apache.lucene.search.spans package to allow
a match for less than all subqueries. This is not going to
be straightforward, but it is possible. In case you choose
this last option, please continue on the java-dev list.

Regards,
Paul Elschot

>
> On Fri, Apr 4, 2008 at 12:38 PM, Ana Rabade
> <anafreireveiga@gmail.com>
>
> wrote:
> > I am using ngrams and I need to force that a group of them are
> > together, but if any of them fails, I need that the document is
> > also scored. Perhaps you could help me to find the solution or give
> > me a reference of which changes I must do. I am using
> > SpanNearQuery, because the ngrams must be in order.
> > Thanks for your answer.
> >    - Ana Maria Freire Veiga -
> >
> > On Thu, Apr 3, 2008 at 7:56 PM, Erick Erickson
> > <erickerickson@gmail.com>
> >
> > wrote:
> > > Could you explain your use case? Because to say that you want to
> > > score documents that don't have all the terms with a *phrase
> > > query* is contradictory. The point of a phrase query is exactly
> > > that all the terms are there and within some some proximity.....
> > >
> > >
> > > Best
> > > Erick
> > >
> > > On Thu, Apr 3, 2008 at 12:17 PM, Ana RĂ¡bade
> > > <anafreireveiga@gmail.com>
> > >
> > > wrote:
> > > > Hi!
> > > >
> > > > I'm using Lucene Proximity Searches, but I've seen Lucene only
> > > > scores documents which contain all the terms in the phrase. I
> > > > also need to
> > >
> > > score
> > >
> > > > documents although they don't contain all those terms.  Is it
> > > > possible with
> > > > Lucene PhraseQueries or SpanNearQuery? If not, could you tell
> > > > me a way
> > >
> > > to
> > >
> > > > find my solution?
> > > >
> > > > Thank you very much.
> > > >
> > > >    - Ana M. Freire -



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message