lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <cdor...@gmail.com>
Subject Re: Custom scoring for searhing geographic objects
Date Wed, 15 Dec 2010 16:10:08 GMT
Also, when taking the Similarity suggestion below note two things in
Lucene's default behavior that you seem to wish to avoid:

The first is IDF - but only for multi-term queries - otherwise ignore this
comment.
For multi term queries to only consider term frequency and doc length, you
may want to always return 1 for idf() in your Similarity impl (otherwise
terms appearing in more documents will contribute less to the score, which
you seem to wish to avoid).

The second is doc length normalization inaccuracy - as doc lengths are
encoded lossly at search time Lucene might not distinguish the difference
between two documents whose lengths are almost the same. For this, at
indexing time, your Similarity impl for lengthNorm() could be e.g. 1/(10 *
numTokens) - this way reducing the chances that two docs of different length
have the same search time norm.

Doron

On Wed, Dec 15, 2010 at 5:43 PM, Ian Lea <ian.lea@gmail.com> wrote:

> Sounds to me that lucene should do a pretty good job without any extra
> work on your part.  See javadocs for
> org.apache.lucene.search.Similarity
> for details on how it works.  You can change things by providing your
> own implementation.
>
> There is also the org.apache.lucene.search.function package but that
> is much more complex.
>
>
> A web search for "lucene scoring" should find you lots of info.
>
>
> --
> Ian.
>
>
> On Wed, Dec 15, 2010 at 3:28 PM, Pavel Minchenkov <chardex@gmail.com>
> wrote:
> > Hi,
> > Please give me advise how to create custom scoring. I need to result that
> > documents were in order, depending on how popular each term in the
> document
> > (popular = how many times it appears in the index) and length of the
> > document (less terms - higher in search results).
> >
> > For example, index contains following data:
> >
> > ID    | SEARCH_FIELD
> > ------------------------------
> > 1     | Russia
> > 2     | Russia, Moscow
> > 3     | Russia, Volgograd
> > 4     | Russia, Ivanovo
> > 5     | Russia, Ivanovo, Altayskaya street 45
> > 6     | Russia, Moscow, Kremlin
> > 7     | Russia, Moscow, Altayskaya street
> > 8     | Russia, Moscow, Altayskaya street 15
> > 9     | Russia, Moscow, Altayskaya street 15/26
> >
> >
> > And I should get next results:
> >
> >
> > Query                     | Document result set
> > ----------------------------------------------
> > Russia                    | 1,2,4,3,6,7,8,9,5
> > Moscow                  | 2,6,7,8,9
> > Ivanovo                    | 4,5
> > Altayskaya              | 7,8,9,5
> >
> > In fact --- it is a search for geographic objects (cities, streets,
> houses).
> > At the same time can be given only part of the address, and the results
> > should appear the most relevant results.
> >
> > Thanks.
> > --
> > Pavel Minchenkov
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message