lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From manjula wijewickrema <manjul...@gmail.com>
Subject Re: Lucene Scoring
Date Wed, 07 Jul 2010 08:36:04 GMT
Dear Ian,

Thanks a lot for your reply. The way you proposed, working correctly and
solved half of my matter.
Once I run the program, system gave me the following output.
output-
**********************************
Searching for 'milk'

Number of hits: 1

0.13287117

0.13287117 = (MATCH) fieldWeight(contents:milk in 0), product of:

1.7320508 = tf(termFreq(contents:milk)=3)

0.30685282 = idf(docFreq=1, maxDocs=1)

0.25 = fieldNorm(field=contents, doc=0)

Hit: D:\JADE\work\MobilNet\Lucene291\filesToIndex\deron-foods.txt
***********************************************************************************
Here, I have no any problems of calculating values for tf, and idf. But I
have no idea of how to calculate fieldNorm. According to
http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
I think norm(t,d) gives the value for fieldNorm and in my case, the system
returns the value lengthNorm(field) for norm(t,d),

1) Am I correct?
2) If so, coluld you pls. let me know the way (formula) of calculating
lengthNorm(field)? (I checked several documents and codes to understand
this. But was unable to find the mathematical formula behind this method).
3) If lengthNorm(field) is not the case behind fieldNorm, then how to
calculate fieldNorm?

Pls. help me to resolve this matter.

Manjula.


On Tue, Jul 6, 2010 at 12:47 PM, Ian Lea <ian.lea@gmail.com> wrote:

> You are calling the explain method incorrectly.  You need something like
>
>  System.out.println(indexSearcher.explain(query, 0));
>
>
> See the javadocs for details.
>
>
> --
> Ian.
>
>
> On Tue, Jul 6, 2010 at 7:39 AM, manjula wijewickrema
> <manjula53@gmail.com> wrote:
> > Dear Grant,
> >
> > Thanks a lot for your guidence. As you have mentioned, I tried to use
> > explain() method to get the explanations for relevant scoring. But, once
> I
> > call the explain() method, system indicated the following error.
> >
> > Error-
> > 'The method explain(Query,int) in the type Searcher is not applicable for
> > the arguments (String, int)'.
> >
> > In my code I call the explain() method as follows-
> > Searcher.explain("rice",0);
> >
> > Possibly the wrong with my way of passing parameters. In my case, I have
> > chosen "rice" as my query and indexed only one document.
> >
> > Could you pls. let me know what's wrong with this. I also included the
> code
> > with this.
> >
> > Thanx
> > Manjula
> >
> > code-
> > **
> >
> > *import* org.apache.lucene.search.Searcher;
> >
> > *public* *class* LuceneDemo {
> >
> > *public* *static* *final* String *FILES_TO_INDEX_DIRECTORY* =
> "filesToIndex"
> > ;
> >
> > *public* *static* *final* String *INDEX_DIRECTORY* = "indexDirectory";
> >
> > *public* *static* *final* String *FIELD_PATH* = "path";
> >
> > *public* *static* *final* String *FIELD_CONTENTS* = "contents";
> >
> > *public* *static* *void* main(String[] args) *throws* Exception {
> >
> > *createIndex*();
> >
> > *searchIndex*("rice");
> >
> >  }
> >
> > *public* *static* *void* createIndex() *throws* CorruptIndexException,
> > LockObtainFailedException, IOException {
> >
> >  SnowballAnalyzer analyzer = *new* SnowballAnalyzer( "English",
> > StopAnalyzer.ENGLISH_STOP_WORDS);
> >
> > *boolean* recreateIndexIfExists = *true*;
> >
> > IndexWriter indexWriter = *new* IndexWriter(*INDEX_DIRECTORY*, analyzer,
> > recreateIndexIfExists);
> >
> > File dir = *new* File(*FILES_TO_INDEX_DIRECTORY*);
> >
> > File[] files = dir.listFiles();
> >
> > *for* (File file : files) {
> >
> > Document document = *new* Document();
> >
> > String path = file.getCanonicalPath();
> >
> > document.add(*new* Field(*FIELD_PATH*, path, Field.Store.*YES*,
> Field.Index.
> > UN_TOKENIZED,Field.TermVector.*YES*));
> >
> > Reader reader = *new* FileReader(file);
> >
> > document.add(*new* Field(*FIELD_CONTENTS*, reader));
> >
> > indexWriter.addDocument(document);
> >
> >  }
> >
> > indexWriter.optimize();
> >
> > indexWriter.close();
> >
> > }
> >
> > *public* *static* *void* searchIndex(String searchString)
> > *throws*IOException, ParseException {
> >
> > System.*out*.println("Searching for '" + searchString + "'");
> >
> > Directory directory = FSDirectory.getDirectory(*INDEX_DIRECTORY*);
> >
> > IndexReader indexReader = IndexReader.open(directory);
> >
> > IndexSearcher indexSearcher = *new* IndexSearcher(indexReader);
> >
> >  SnowballAnalyzer analyzer = *new* SnowballAnalyzer( "English",
> > StopAnalyzer.ENGLISH_STOP_WORDS);
> >
> > QueryParser queryParser = *new* QueryParser(*FIELD_CONTENTS*, analyzer);
> >
> > Query query = queryParser.parse(searchString);
> >
> > Hits hits = indexSearcher.search(query);
> >
> > System.*out*.println("Number of hits: " + hits.length());
> >
> > TopDocs results = indexSearcher.search(query,10);
> >
> > ScoreDoc[] hits1 = results.scoreDocs;
> >
> > *for* (ScoreDoc hit : hits1) {
> >
> > Document doc = indexSearcher.doc(hit.doc);
> >
> > //System.out.printf("%5.3f %s\n",hit.score,doc.get(FIELD_CONTENTS));
> >
> > System.*out*.println(hit.score);
> >
> > Searcher.explain("rice",0);
> >
> > }
> >
> >  Iterator<Hit> it = hits.iterator();
> >
> > *while* (it.hasNext()) {
> >
> > Hit hit = it.next();
> >
> > Document document = hit.getDocument();
> >
> > String path = document.get(*FIELD_PATH*);
> >
> > System.*out*.println("Hit: " + path);
> >
> > }
> >
> > }
> >
> > }
> >
> >
> > On Mon, Jul 5, 2010 at 7:46 PM, Grant Ingersoll <gsingers@apache.org>
> wrote:
> >
> >>
> >> On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote:
> >>
> >> > Hi,
> >> >
> >> > In my application, I input only single term query (at one time) and
> get
> >> back
> >> > the corresponding scorings for those queries. But I am little
> struggling
> >> of
> >> > understanding Lucene scoring. I have reffered
> >> >
> >>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> >> > and
> >> > some other pages to resolve my matters. But some are still remain.
> >> >
> >> > 1) Why it has taken the squareroot of frequency as the tf value and
> >> square
> >> > of the idf vale in score function?
> >>
> >> Somewhat arbitrary, I suppose, but I think someone way back did some
> tests
> >> and decided it performed "best" in general.  More importantly, the point
> of
> >> the Similarity class is you can override these if you desire.
> >>
> >> >
> >> > 2) If I enter single term query, then what will return bythe
> coord(q,d)?
> >> > Since there are always one term in the query, I think always it should
> be
> >> 1!
> >> > Am I correct?
> >>
> >> Should be.  You can run the explain() method to confirm.
> >>
> >> >
> >> > 3) I am also struggling understanding sumOfSquaredWeights (in
> >> queryNorm(q)).
> >> > As I can understand, this value depends on the nature of the query we
> >> input
> >> > and depends on that, it uses different methods such as TermQuery,
> >> > MultiTermQuery, BooleanQuery, WildcardQuery, PhraseQuery, PrefixQuery,
> >> etc.
> >> > But if I always use single term query, then what will be the way
> selected
> >> by
> >> > the system from above?
> >>
> >> The queryNorm is an attempt at making scores comparable across queries.
> >>  Again, I'd try the explain() method to see the practical aspects of how
> it
> >> effects score.
> >>
> >> See http://lucene.apache.org/java/2_4_0/scoring.html for more info on
> >> scoring.
> >>
> >> -Grant
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message