lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: Result ordering
Date Sun, 16 Jan 2011 15:26:04 GMT
Hi Pelit,
Firstly, number of words that match a query in a document is not term
frequency. You may get some more idea on the terminologies used in search
at
http://www.miislita.com/term-vector/term-vector-3.html

Looking at what you're trying to achieve, a few solutions to you would be.
Below, I am assuming your query to be for terms "A B":
1. Look at phrase queries and setting a high slop value so matches like "A
B" are weighted more than "A .... B" (separated by a few positions).
2. Also you could have a custom similarity (advanced)
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/SimilarityDelegator.html
and
use the coord value.
3. For wildcard searches, a brute force mechanism could be, you may have an
or query  (finally the query is expanded to an OR query anyways) for RING OR
RING* and boost the former part.

Looking at your current query seems like you'd need more understanding on
lucene and getting a copy of "Lucene In Action 2nd
Ed<http://www.manning.com/hatcher3/>."
would be  a good idea for you and everyone in your position.
Hope that helps.

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Sun, Jan 16, 2011 at 8:03 PM, Pelit Mamani
<Pelit.Mamani@timetoknow.com>wrote:

> Hi,
>
> I'm maintaining some Lucene-based code, and we're trying to get control
> over result ordering (users aren't happy with the default).
> I know how to boost a Field or Document (very useful).
> But:
>
>
> 1)      Is there a way to boost "OR" queries, based on the number of
> matched terms?
> So the OR query "lord rings" will first show the document "LORD of the
> RINGS" (which holds both words), and only later "selected jewels and RINGS"
> (which only holds one word).
> Is that what you call "Term Frequency"? And how do you boost it further?
> I did a bit of tinkering and got the impression Lucene would boost it by
> default, but not enough - it's sometimes overridden by other boosting
> factors (maybe the boost for short expressions).
>
>
> 2)      Is there a way to boost based on "positions"?
> So "LORD of the RINGS" gets precedence over "LORD of the funny golden
> RINGS", because the search words are positioned closer to each other?
>
>
> 3)      With wildcard searches, is there a way to boost documents that hold
> an exact match.
> So if I search for "ring*", I first see the exact match "story of a RING",
> and only later "a RINGING failure"
>
> Thanx a bunch.
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message