lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Boost Document
Date Sat, 18 Nov 2006 15:52:09 GMT
I can answer a small part of your question... Doc IDs have nothing to do
with scoring.  Each time you index a document, it get a doc id greater than
any already in the index, and they get reassigned if you delete docs and
optimize.... They *may* be used when scoring to break ties but that doesn't
do you any good.......

For the other part of your question, I'll defer to the experts....

Erick

On 11/18/06, John Pailet <9i7svel2zr6uujn@jetable.org> wrote:
>
>
> Hi Chris,
>
> You are right !!!
> Here is the explain output:
>
> ----- DOC 222-----home-
> 40960.0 = fieldWeight(WORD:home in 0), product of:
>   1.0 = tf(termFreq(WORD:home)=1)
>   1.0 = idf(docFreq=2)
>   40960.0 = fieldNorm(field=WORD, doc=0)
>
> ----- DOC 111-----home-
> 40960.0 = fieldWeight(WORD:home in 1), product of:
>   1.0 = tf(termFreq(WORD:home)=1)
>   1.0 = idf(docFreq=2)
>   40960.0 = fieldNorm(field=WORD, doc=1)
>
> As you can see, doc 222 and doc 111 get exactly the same score, even if
> data
> are exactly the same and even id Doc111 boost value is greater
> than  Doc222
> boost value !!!
>
> Is the internal document ID part of score computation ?
>
> How can I solve this problem of gertting same score for different boost
> values?
>
> Thanks a lot !
>
> John
>
>
>
>
>
> Chris Hostetter wrote:
> >
> >
> > : When I have 2 documents that have exactly the same data, but different
> > boost
> > : value.
> > : The order does not respect the boost value. It the following exemples,
> > the
> > : first document of the search is the document with the lower boost
> > value...
> > : is it a bug ?
> >
> > i would suggest you look at the explain output ... but it seems you
> > already are: what does that tell you? what scores do the indvidual
> > documents get, and how do their explanations differ?
> >
> > if i had to guess, i would suspect that your boosts are big enough that
> > they become equivilent when byte encoded and stored as fieldNorms
> (that's
> > what happens to document boosts, they get written as part of hte
> fieldNorm
> > for each field in the document) and the only reason you are getting the
> > order that you are is because doc2 has a lower internal id then doc1
> ....
> > if i'm right, then the scores and the explanations would be equal.
> >
> > ...does that jive with what you are seeing?
> >
> >
> > :
> > : Here is my test code:
> > :
> > : public void testIt() throws IOException, ParseException
> > :     {
> > :             RAMDirectory ramDir = new RAMDirectory();
> > :             IndexWriter writer = new IndexWriter(ramDir, new
> StandardAnalyzer(),
> > : true);
> > :
> > :             Document doc1 = new Document();
> > :             Field word = new Field("WORD","home",
> > : Field.Store.YES,Field.Index.TOKENIZED);
> > :             word.setBoost(15);
> > :             doc1.add(word);
> > :             Field id = new Field("ID","111", Field.Store.YES,
> Field.Index.NO);
> > :             doc1.add(id);
> > :             doc1.setBoost(3163);
> > :
> > :             Document doc2 = new Document();
> > :             word = new Field("WORD","home",
> > Field.Store.YES,Field.Index.TOKENIZED);
> > :             word.setBoost(15);
> > :             doc2.add(word);
> > :             id = new Field("ID","222", Field.Store.YES,Field.Index.NO
> );
> > :             doc2.add(id);
> > :             doc2.setBoost(3150);
> > :
> > :
> > :             writer.addDocument(doc2);
> > :             writer.addDocument(doc1);
> > :
> > :             writer.optimize();
> > :             writer.close();
> > :
> > :             Searcher searcher = new IndexSearcher(ramDir);
> > :
> > :             QueryParser queryParserWord = new QueryParser("WORD",new
> > : StandardAnalyzer());
> > :             Query query = queryParserWord.parse("home");
> > :
> > :             Hits hits = searcher.search(query);
> > :
> > :             for (int i = 0; i < hits.length(); i++) {
> > :                     System.out.println("----- DOC " + hits.doc(i).get("ID")
> +"-----" +
> > : hits.doc(i).get("WORD")+"-");
> > :                     System.out.println(searcher.explain(query,hits.id
> (i)).toString());
> > :             }
> > :     }
> > :
> > :
> > : Thanks a lot !
> > : --
> > : View this message in context:
> > http://www.nabble.com/Boost-Document-tf2654631.html#a7405959
> > : Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> > :
> > :
> > : ---------------------------------------------------------------------
> > : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > : For additional commands, e-mail: java-user-help@lucene.apache.org
> > :
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Boost-Document-tf2654631.html#a7417101
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message