Hi Franck, Thank you so much for the detailed explanation. However, when I tried to break up my MultiFieldQueryParser into a series of BooleanQueries, the result set has got reduced drastically. Any idea why this could be happening? Regards, Niraj ----- Original Message ----- From: "Brisbart Franck" To: "Lucene Users List" Sent: Thursday, June 24, 2004 2:54 PM Subject: Re: score and frequency > The MultiFieldQueryParser give you a BooleanQuery containing 1 query for > each field. > Something like: > BooleanQuery > / | | \ > QF1 QF2 QF3 QF4 (QFx=Query for field x) > > You can still use the MultiFieldQueryParser and create a BooleanQuery to > encapsulate the one parsed + the PhraseQuery, ie: > BooleanQuery(created by you) > / \ > BQ PhraseQuery > > Or create the whole query (I think you should do that) and have > something like that: > _BooleanQuery__ > / | | \ \ > QF1 QF2 QF3 QF4 PhraseQuery (QFx=Query for field x) > > It's like parsing the following query: > (field1:query) (field2:query) (field3:query)...(fieldx:query) > (title:"query")~boost > > > Franck > > > Niraj Alok wrote: > > I asked the previous question since I do not know how to use PhraseQuery > > > > I have one booleanquery and one query. > > The query is Query query = MultiFieldQueryParser.parse( qs, searchLoc, > > flags, new StandardAnalyzer(stop)); > > > > where qs is the word to be searched upon and searchLoc contains all the four > > fields. > > > > How do I insert a PhraseQuery here for title field only, and that too with > > its boosted value? > > > > > > Regards, > > Niraj > > ----- Original Message ----- > > From: "Niraj Alok" > > To: "Lucene Users List" > > Sent: Thursday, June 24, 2004 2:00 PM > > Subject: Re: score and frequency > > > > > > > >>Does it mean that I would need to abandon MultiFieldQueryParser? > >> > >>Regards, > >>Niraj > >>----- Original Message ----- > >>From: "Brisbart Franck" > >>To: "Lucene Users List" > >>Sent: Thursday, June 24, 2004 1:22 PM > >>Subject: Re: score and frequency > >> > >> > >> > >>>Hi, > >>>first, what do you consider as an 'exact matching' ? It seems that you > >>>treat the search word by word, so 'lion sea' will be an 'exact match' of > >>>'sea-lion'. > >>>I think you should add a PhraseQuery to your query containing the title > >>>and with a big boost. So, you don't need to boost your title field. Only > >>>the results matching exactly (for the PhraseQuery) will be boosted. > >>> > >>>Franck > >>> > >>> > >>>Niraj Alok wrote: > >>> > >>>>Hi Guys, > >>>> > >>>>I seem to have run into rough weather again. > >>>>To describe the problem as concisely as possible, I have four fields > > > > to > > > >>search upon : title , first para, rest of the paras and content (equal to > >>title + first para + rest of the para) . I am doing this by using > >>MultiFieldQueryParser. > >> > >>>>Now there is a very complicated ranking algrorithm specified by the > >> > >>client and I have met most of them except one or two and really need your > >>help as all my other efforts have failed. > >> > >>>>The most important rule is that exact matching titles should come > > > > first > > > >>, i.e. get higher scores. > >> > >>>>I have given the highest boost factor to the title than the rest but > > > > the > > > >>problem comes up when there is some other title which has got just one > > > > word > > > >>matching. For e.g., if I search for lion, there is a title sea-lion which > >>also has the same boost factor as that of "lion" in the index. Also, > >>sea-lion has got some more "lion" in its first para or rest of the paras > >>etc. such that its score comes higher than "lion". > >> > >>>>Is there some way to get the exact matching titles higher scores? > >>>>Please reply soon. > >>>> > >>>> > >>>>Regards, > >>>>Niraj > >>>> > >>>> > >>>>----- Original Message ----- > >>>>From: "Brisbart Franck" > >>>>To: "Lucene Users List" > >>>>Sent: Monday, June 07, 2004 12:50 PM > >>>>Subject: Re: score and frequency > >>>> > >>>> > >>>> > >>>> > >>>>>It seems that you don't the length norm to be used. It's a factor > > > > which > > > >>>>>normalize the score of a doc depending on the size of the searched > > > > field > > > >>>>>of the doc. It's the field which make that 'ground ice' has a higher > >>>>>score than 'ice hockey: British Sekonda Superleague Play-Off > >>>>>Championship: finals' because it only has 2 terms. > >>>>>So, I suggest you to override the lengthNorm method and to ignore the > >>>>>numTokens parameter. > >>>>>NB: The length norm is computed during the indexation and the norm are > >>>>>store in the index (in the _aaa.f# files). So, you need to do re-index > >>>>>your data, and use this similarity during the indexation. > >>>>> > >>>>>Cheers, > >>>>>Franck > >>>>> > >>>>> > >>>>>Niraj Alok wrote: > >>>>> > >>>>> > >>>>>>I have set the searcher.setSimilarity as well as also tried setting > >> > >>the > >> > >>>>>>coord factor to 1. > >>>>>> > >>>>>>The problem as given by an example is : Lets say I have titles to be > >>>>>>displayed depending upon the search. > >>>>>>E.g if i have "ice hockey" as the search item and if it is default > >>>>>>similarity, my results are : > >>>>>> > >>>>>>ice hockey0.99999994 > >>>>>>ice hockey0.75 > >>>>>>ice hockey0.75 > >>>>>>winter Olympics: hockey, ice, medallists0.17402513 > >>>>>>ice age0.073680125 > >>>>>>National Hockey League0.020266924 > >>>>>>Cracking the Ice Age0.018420031 > >>>>>>ground-ice0.011512519 > >>>>>>ice hockey: British Sekonda Superleague Play-Off Championship: > >>>>>>finals0.0069075115 > >>>>>>(the numbers indicating the score). > >>>>>> > >>>>>> > >>>>>>But if i set the similarity as my overridden one, the results become: > >>>>>>ice hockey0.99999994 > >>>>>>ice hockey0.75 > >>>>>>ice hockey0.75 > >>>>>>ice age0.22104037 > >>>>>>winter Olympics: hockey, ice, medallists0.17402513 > >>>>>>National Hockey League0.060800765 > >>>>>>Cracking the Ice Age0.055260092 > >>>>>>ground-ice0.034537554 > >>>>>>ice hockey: British Sekonda Superleague Play-Off Championship: > >>>>>>finals0.020722535 > >>>>>> > >>>>>> > >>>>>>I want all the titles which have both "ice" and "hockey" to come > > > > above > > > >>the > >> > >>>>>>rest (to have higher scores) > >>>>>>Meaning i would wish the results to appear like: > >>>>>> > >>>>>>ice hockey > >>>>>>ice hockey > >>>>>>ice hockey > >>>>>>winter Olympics: hockey, ice, medallists > >>>>>>ice hockey: British Sekonda Superleague Play-Off Championship: finals > >>>>>>ice age > >>>>>>National Hockey League > >>>>>>Cracking the Ice Age > >>>>>>ground-ice > >>>>>> > >>>>>>My overriden similarity class contains just this method: > >>>>>>public float coord(int overlap, int maxOverlap) { > >>>>>> > >>>>>>return 1.0f; > >>>>>> > >>>>>>} > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>I feel it is the weight factor which is producing indesirable > > > > results. > > > >>Any > >> > >>>>>>help in this regard would be highly appreciated. > >>>>>> > >>>>>>Regards, > >>>>>>Niraj > >>>>>> > >>>>>>----- Original Message ----- > >>>>>>From: "Brisbart Franck" > >>>>>>To: "Lucene Users List" > >>>>>>Sent: Friday, June 04, 2004 8:46 PM > >>>>>>Subject: Re: score and frequency > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Hi, > >>>>>>> > >>>>>>>Be careful to set the default similarity > >>>>>>>'Similarity.setDefault(similarity)' before creating your search > >> > >>instance > >> > >>>>>>>(IndexSearcher). > >>>>>>>If you change the default similarity after, you'll still use the old > >> > >>one. > >> > >>>>>>>You'd better use the 'searcher.setSimilarity' method on your > > > > searcher. > > > >>>>>>>Franck > >>>>>>> > >>>>>>> > >>>>>>>Phil brunet wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>Hi to all. > >>>>>>>> > >>>>>>>>Maybe the term frequency is not the only parameter you need to > >> > >>override > >> > >>>>>>>>to "customize" the score attributed by Lucene. > >>>>>>>> > >>>>>>>>Maybe you should consider the normalisation factor, the idf and the > >>>>>>>>coord factor ? > >>>>>>>> > >>>>>>>>Philippe > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>From: "Niraj Alok" > >>>>>>>>>Reply-To: "Lucene Users List" > >>>>>>>>>To: "Lucene Users List" > >>>>>>>>>Subject: Re: score and frequency > >>>>>>>>>Date: Fri, 4 Jun 2004 15:13:32 +0530 > >>>>>>>>> > >>>>>>>>>Hi Erik, > >>>>>>>>> > >>>>>>>>>Thanks for the suggestion. > >>>>>>>>> > >>>>>>>>>I tried this: > >>>>>>>>>public class RelevanceSimilarity extends DefaultSimilarity > >>>>>>>>> > >>>>>>>>>{ > >>>>>>>>> > >>>>>>>>>public float tf(float freq) { > >>>>>>>>> > >>>>>>>>>System.out.println("discounting frequency"); > >>>>>>>>> > >>>>>>>>>return (float)1; > >>>>>>>>> > >>>>>>>>>} > >>>>>>>>> > >>>>>>>>>} > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>and in my query class, I used : > >>>>>>>>> > >>>>>>>>>Similarity.setDefault(similarity); > >>>>>>>>> > >>>>>>>>>Hits hits = is.search(query); > >>>>>>>>> > >>>>>>>>>for(i = 0; i < hits.length(); i ++) > >>>>>>>>> > >>>>>>>>>result = result + hits.score(i); > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>However, this is still not giving me the expected result. Do I > > > > need > > > >>to > >> > >>>>>>do > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>something else? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>Regards, > >>>>>>>>>Niraj > >>>>>>>>> > >>>>>>>>>----- Original Message ----- > >>>>>>>>>From: "Erik Hatcher" > >>>>>>>>>To: "Lucene Users List" > >>>>>>>>>Sent: Friday, June 04, 2004 1:55 PM > >>>>>>>>>Subject: Re: score and frequency > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>Hi, > >>>>>>>>>>> > >>>>>>>>>>>I am having some problems with the score of lucene. > >>>>>>>>>>>I am trying to get the results displayed according to hits.score > >>>>>> > >>>>>>and > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>>it is giving the results correctly. > >>>>>>>>>>>However I do not want the frequency factor to be used for the > >>>>>>>>>>>computation of the score. > >>>>>>>>>>> > >>>>>>>>>>>Is it possible to get the score which does not have the > > > > frequency > > > >>>>>>>>>>>factor in it ? > >>>>>>>>>> > >>>>>>>>>>Have a look at the javadocs for Similarity. DefaultSimilarity is > >>>>>> > >>>>>>used > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>unless otherwise specified. You could subclass that and override > >>>>>> > >>>>>>this: > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>> public float tf(float freq) { > >>>>>>>>>> return (float)Math.sqrt(freq); > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>>and return 1.0. This might give you the effect you want. > >>>>>>>>>> > >>>>>>>>>>Erik > >>>>>>>>>> > >>>>>>>>>> > >>> > >>>>>>>>-------------------------------------------------------------------- - > >>>>>>>> > >>>>>>>>>>To unsubscribe, e-mail: > > > > lucene-user-unsubscribe@jakarta.apache.org > > > >>>>>>>>>>For additional commands, e-mail: > >> > >>lucene-user-help@jakarta.apache.org > >> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>--------------------------------------------------------------------- > >>>>>>> > >>>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>>>>>>>>For additional commands, e-mail: > > > > lucene-user-help@jakarta.apache.org > > > >>>>>>>>_________________________________________________________________ > >>>>>>>>Bloquez les fenêtres pop-up, c'est gratuit ! http://toolbar.msn.fr > >>>>>>>> > >>>>>>>> > >>> > >>>>>>--------------------------------------------------------------------- > >>>>>> > >>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>>>>>>>For additional commands, e-mail: > > > > lucene-user-help@jakarta.apache.org > > > >>>>>>> > >>>>>>>-- > >>>>>>>Franck Brisbart > >>>>>>>R&D > >>>>>>>http://www.kelkoo.com > >>>>>>> > >>>>>>> > >>> > >>>>>--------------------------------------------------------------------- > >>>>> > >>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>--------------------------------------------------------------------- > >>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >>>>>> > >>>>> > >>>>> > >>>>>-- > >>>>>Franck Brisbart > >>>>>R&D > >>>>>http://www.kelkoo.com > >>>>> > >>>>> > >>>>>--------------------------------------------------------------------- > >>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >>>> > >>>> > >>>> > >>> > >>>-- > >>>Franck Brisbart > >>>R&D > >>>http://www.kelkoo.com > >>> > >>> > >>>--------------------------------------------------------------------- > >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >>> > >> > >> > >> > >> > >>--------------------------------------------------------------------- > >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >> > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > -- > Franck Brisbart > R&D > http://www.kelkoo.com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org >