lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Serba <ase...@gmail.com>
Subject Re: Custom scoring for searhing geographic objects
Date Sun, 19 Dec 2010 21:35:09 GMT
Hi Pavel,

I had the similar problem several years ago - I had to find
geographical locations in textual descriptions, geocode these objects
to lat/long during indexing process and allow users to filter/sort
search results to specific geographical areas. The important issue was
that there were several types of geographical objects - street < town
< region < country. The idea was to geocode to most narrow
geographical area as possible. Relevance logic in this case could be
specified as "find the most narrow result that is unique identified by
your text or search query".  So I came up with custom algorithm that
was quite good in terms of performance and precision/recall. Here's
the simple description:
* You can intersect all text/searchquery terms with locations
dictionary to find only geo terms
* Search in your locations Lucene index and filter only street objects
(the most narrow areas). Due to tf*idf formula you'll get the most
relevant results. Then you need to post process N (3/5/10) results and
verify that they are matches indeed. I did intersect search terms with
result's terms and make another lucene search to verify if these terms
are unique identifying the match. If it's then return matching street.
If there's no any match proceed using the same algorithm with towns,
regions, countries.

HTH,
Alexey

On Wed, Dec 15, 2010 at 6:28 PM, Pavel Minchenkov <chardex@gmail.com> wrote:
> Hi,
> Please give me advise how to create custom scoring. I need to result that
> documents were in order, depending on how popular each term in the document
> (popular = how many times it appears in the index) and length of the
> document (less terms - higher in search results).
>
> For example, index contains following data:
>
> ID    | SEARCH_FIELD
> ------------------------------
> 1     | Russia
> 2     | Russia, Moscow
> 3     | Russia, Volgograd
> 4     | Russia, Ivanovo
> 5     | Russia, Ivanovo, Altayskaya street 45
> 6     | Russia, Moscow, Kremlin
> 7     | Russia, Moscow, Altayskaya street
> 8     | Russia, Moscow, Altayskaya street 15
> 9     | Russia, Moscow, Altayskaya street 15/26
>
>
> And I should get next results:
>
>
> Query                     | Document result set
> ----------------------------------------------
> Russia                    | 1,2,4,3,6,7,8,9,5
> Moscow                  | 2,6,7,8,9
> Ivanovo                    | 4,5
> Altayskaya              | 7,8,9,5
>
> In fact --- it is a search for geographic objects (cities, streets, houses).
> At the same time can be given only part of the address, and the results
> should appear the most relevant results.
>
> Thanks.
> --
> Pavel Minchenkov
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message