lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: REPOST from another list: Question related to improving search results
Date Sat, 02 May 2009 10:55:28 GMT
Why not remove that content from every doc during indexing?

Or, if that's too harsh, you could massively reduce the score for hits
in that section, eg during indexing store payloads on those term
occurrences falling within the common section, and then use
BoostingTermQuery to down-weight those hits.

Mike

On Sat, May 2, 2009 at 6:49 AM, Aditya <aditya.kulkarni@gmail.com> wrote:
> Hi,
>
>
>
> New to this group.
>
>
>
> Question:
>
>
>
> Generally sites like wikipeadia have a template and every page follows it.
> These templates contains the word that occurs in every page.
>
>
>
> For example wikipedia template has the list of language in the left panel.
> Now these words gets indexed every time since they are not (cannot be) stop
> words.
>
> if user for example search for "Galego", every wikipedia page will be in the
> search result which is wrong as every wikipedia page does not talk about
> "Galego"
>
>
>
> Any takes on this one for how to solve this problem?
>
>
>
> Best Regards,
>
> Aditya
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message