lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamal H Tandina <jtand...@yahoo.fr>
Subject RE : Re: RE : Re: RE : Re: problem undestanding the hits.score
Date Sun, 11 Nov 2007 13:05:54 GMT

 Thanks you for your reply

 The thing is i'am trying to emplement a weight for a word form indexing html
 web pages.

 The is like :

*50% +  Weigth(word in doc d) = *20% + * 10% +
 ...

 the code is :
 =============================================================
 doc.add(new Field("url", httpd.getURL(), Field.Store.YES,
 Field.Index.UN_TOKENIZED));

 Field tiTle = new
 Field("title",httpd.getTitle(),Field.Store.YES,Field.Index.TOKENIZED );
 tiTle.setBoost(1/2f);   //50%
 doc.add(tiTle);


 Field keyWords1 = new
 Field("keywords",httpd.getKeywords(),Field.Store.YES,Field.Index.TOKENIZED
 );
 keyWords1.setBoost(2/5f);  //40% 2/5
 doc.add(keyWords1);

 Field contentK1 = new
 Field("contentKeywords",httpd.getContentKeywords(),Field.Store.NO,Field.Index.TOKENIZED
 );
 contentK1.setBoost(1/10f);  //10%
 doc.add(contentK1);

 writer.addDocument(doc);
 ===========================================================

 and the hit.score return just a field score when we do the searching.  how
 can i get this together

*50% +  Weigth(word in doc d) = *20% + * 10% +
 ...

 THANK YOU FOR YOUR TIME


> Erick Erickson  a écrit : I strongly recommend
> against this. Simple word counts are a poor
> measure of relevance. Which is why Lucene doesn't score that
> way. Do you have an example showing why the default scoring is
> inadequate or is this just an assumption?
>
> It would be helpful if you gave us some idea of what you're trying
> to accomplish. What is the use-case you're trying to solve? That
> would generate more helpful responses I think....
>
> Best
> Erick
>
> On 11/2/07, Jamal H Tandina  wrote:
> >
> > <<<<
> >
> > If you want to give priority to documents that are larger, like z1, you
> >
> > should change the DefaultSimilarity (at index  time), more exactly the
> > method:
> >
> >   public float lengthNorm(String fieldName, int numTerms) {
> >     return (float)(1.0 / Math.sqrt(numTerms));
> >   }
> >
> > to something like this
> >
> >   public float lengthNorm(String fieldName, int numTerms) {
> >     return (float)(Math.sqrt(numTerms));
> >
> >
> >   }
> > >>>
> >
> > I want to give priority to documents that have the word we are searching
> > more frequent !
> >
> > Thank you
> >
> >
> >
> > Jamal H Tandina  a écrit : Thank you  for your reply
> >
> > How can i change the defaultSimilarity in the indexing and the searching,
> > do you have an example or an url how to set the Similarity  ?
> >
> >
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html
> >
> > Thanks again
> >
> > Ion Badita  a  écrit : Try too look at Similarity, there you will find
> > thinks about the
> > scoring. Your query is more "similar" with the shorter document.
> > If you have 2 documents with a field body; first with words "red flower"
> > and the second with just one word "flower", and search for the word
> > "flower", the second document will score high because is very similar
> > with the query.
> >
> > If you want to give priority to documents that are larger, like z1, you
> > should change the DefaultSimilarity (at index time), more exactly the
> > method:
> >
> >   public float lengthNorm(String fieldName, int numTerms) {
> >     return (float)(1.0 / Math.sqrt(numTerms));
> >   }
> >
> > to something like this
> >
> >   public float lengthNorm(String fieldName, int numTerms) {
> >     return (float)(Math.sqrt(numTerms));
> >   }
> >
> >
> > Reindex your documents with the Similarity  modified and try to search
> > again. The IndexWriter has a method to set the similarity used for
> > indexing.
> >
> >
> > I hope this will help you...
> >
> >
> > Ion
> >
> >
> >
> > Jamal jamalator wrote:
> > >  Hi
> > >
> > > I have indexed this html document
> > > =============z1========================
> > >
> > >
> > > zo zo zo zo zo zo zo zo zo zo zo zo
> >
> > > zo zo zo zo zo zo zo zo zo zo zo zo
> >
> > > zo zo zo zo zo zo zo zo zo zo zo zo
> > >
> > >
> > > =============z2=========================
> > >
> > >
> > >  zo zo zo zo zo zo zo zo zo zo zo zo
> >
> > >  zo zo zo zo zo zo zo zo zo zo zo zo
> >
> > >
> > >
> > > =============z3==========================
> > >
> > >
> > >  zo zo zo zo zo zo zo zo zo zo zo zo
> >
> > >
> >  >
> > > =========================================
> > > with this code
> > >
> > > Field contentK1 = new  Field("htmlcontent",httpd.getContentKeywords(),
> > Field.Store.NO,Field.Index.TOKENIZED );
> > > contentK1.setBoost(1/10f);  //10%
> > > doc.add(contentK1);
> > >
> > > and when a search "zo" with luke i have (whitespaceanalyser):
> > >
> > > (score , id   )
> > > (0,0957,z2 )
> > > (0,0947,z3 )
> > > (0,0938,z1)
> > >
> > > NORMALY the resut expected have to be z1 z2 z3
> > >
> > > Some One have an idea ??
> > >
> > > Thank you all
> > >
> > >
> > >
> > > ---------------------------------
> > >   Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers
> > Yahoo! Mail
> > >
> > >
> > > ---------------------------------
> > >  Ne gardez plus qu'une seule  adresse mail ! Copiez vos mails vers
> Yahoo!
> > Mail
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> > ---------------------------------
> > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
> > Mail
> >
> >
> > ---------------------------------
> > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
> > Mail
>
>
>
> ---------------------------------
>   Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
> Mail
>
>
> ---------------------------------
>  Stockage illimité de vos mails avec Yahoo! Mail. Changez aujourd'hui de
> mail !


             
---------------------------------
 Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message