lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pelit Mamani <Pelit.Mam...@timetoknow.com>
Subject RE: Result ordering
Date Mon, 17 Jan 2011 08:46:24 GMT
Thank you both, Umesh Prasad and Anshum.
You've been a great help.



-----Original Message-----
From: Anshum [mailto:anshumg@gmail.com] 
Sent: Sunday, January 16, 2011 5:26 PM
To: java-user@lucene.apache.org
Subject: Re: Result ordering

Hi Pelit,
Firstly, number of words that match a query in a document is not term frequency. You may get
some more idea on the terminologies used in search at http://www.miislita.com/term-vector/term-vector-3.html

Looking at what you're trying to achieve, a few solutions to you would be.
Below, I am assuming your query to be for terms "A B":
1. Look at phrase queries and setting a high slop value so matches like "A B" are weighted
more than "A .... B" (separated by a few positions).
2. Also you could have a custom similarity (advanced) http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/SimilarityDelegator.html
and
use the coord value.
3. For wildcard searches, a brute force mechanism could be, you may have an or query  (finally
the query is expanded to an OR query anyways) for RING OR
RING* and boost the former part.

Looking at your current query seems like you'd need more understanding on lucene and getting
a copy of "Lucene In Action 2nd Ed<http://www.manning.com/hatcher3/>."
would be  a good idea for you and everyone in your position.
Hope that helps.

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Sun, Jan 16, 2011 at 8:03 PM, Pelit Mamani
<Pelit.Mamani@timetoknow.com>wrote:

> Hi,
>
> I'm maintaining some Lucene-based code, and we're trying to get 
> control over result ordering (users aren't happy with the default).
> I know how to boost a Field or Document (very useful).
> But:
>
>
> 1)      Is there a way to boost "OR" queries, based on the number of
> matched terms?
> So the OR query "lord rings" will first show the document "LORD of the 
> RINGS" (which holds both words), and only later "selected jewels and RINGS"
> (which only holds one word).
> Is that what you call "Term Frequency"? And how do you boost it further?
> I did a bit of tinkering and got the impression Lucene would boost it 
> by default, but not enough - it's sometimes overridden by other 
> boosting factors (maybe the boost for short expressions).
>
>
> 2)      Is there a way to boost based on "positions"?
> So "LORD of the RINGS" gets precedence over "LORD of the funny golden 
> RINGS", because the search words are positioned closer to each other?
>
>
> 3)      With wildcard searches, is there a way to boost documents that hold
> an exact match.
> So if I search for "ring*", I first see the exact match "story of a 
> RING", and only later "a RINGING failure"
>
> Thanx a bunch.
>
>
>
>

 
 
************************************************************************************
This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for
the presence of malicious code, vandals & computer viruses.
************************************************************************************




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message