lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rebecca Watson <bec.wat...@gmail.com>
Subject Re: how to extend Similarity in this situation?
Date Wed, 02 Jun 2010 06:24:50 GMT
Hi Li Li

If you want to support some query types and not others you should
overide/extend the queryparser so that you throw an exception / makes
a different query type instead.

Similarity doesn't do the actual scoring, it's used by the Query
classes (actually the Scorer implementation used by the Query)
to get tf / idf scores. So scoring is done by the Query types themselves
(the Weight/Scorer classes used by the Query). So sometimes
you change/extend Similarity to affect scoring and sometimes you
need to change/extend the Query/Scorer/Weight classes etc.
Luckily though, there are already some Query types that boost
scores if the terms are closer -- the SpanQuery query types do this.

You will either have to write your own queryparser (to convert query string
to the Query object(s) required,
or you could use e.g. the Surround queryparser supports specifying
SpanQuery queries via user-entered text:
http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/

look in the lucene contrib/surround package for the surround queryparser.

I've messed about with using SpanQuery classes a bit and people report
that they are slower - which is true because they use a lot of extra
info to perform their matching/scoring -- but they are still easily
sub-second over most
queries / reasonable sized doc collection
-- and they only get a lot slower if you have terms that occur very frequently
in the document collection you are searching.
So if you are removing stop words etc you shouldn't have too
many problems efficiency wise.

hope that helps,

bec :)

On 1 June 2010 22:07, Li Li <fancyerii@gmail.com> wrote:
> I want to only support boolean or query(as many search engine do). But
> I want to boost document whose terms are closer.
> e.g. the query terms are 'apache lucene'
> doc1 apache has many projects such as lucene
> doc2 The Apache HTTP Server Project is an effort to develop and
> maintain an ...  Lucene is a .....
> I want to give doc1 more boost.
> So I need a place where I can get all position information
> How to extend Similarity to achieve this?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message