lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From manjula wijewickrema <manjul...@gmail.com>
Subject Re: Lucene Scoring
Date Tue, 06 Jul 2010 06:39:27 GMT
Dear Grant,

Thanks a lot for your guidence. As you have mentioned, I tried to use
explain() method to get the explanations for relevant scoring. But, once I
call the explain() method, system indicated the following error.

Error-
'The method explain(Query,int) in the type Searcher is not applicable for
the arguments (String, int)'.

In my code I call the explain() method as follows-
Searcher.explain("rice",0);

Possibly the wrong with my way of passing parameters. In my case, I have
chosen "rice" as my query and indexed only one document.

Could you pls. let me know what's wrong with this. I also included the code
with this.

Thanx
Manjula

code-
**

*import* org.apache.lucene.search.Searcher;

*public* *class* LuceneDemo {

*public* *static* *final* String *FILES_TO_INDEX_DIRECTORY* = "filesToIndex"
;

*public* *static* *final* String *INDEX_DIRECTORY* = "indexDirectory";

*public* *static* *final* String *FIELD_PATH* = "path";

*public* *static* *final* String *FIELD_CONTENTS* = "contents";

*public* *static* *void* main(String[] args) *throws* Exception {

*createIndex*();

*searchIndex*("rice");

 }

*public* *static* *void* createIndex() *throws* CorruptIndexException,
LockObtainFailedException, IOException {

 SnowballAnalyzer analyzer = *new* SnowballAnalyzer( "English",
StopAnalyzer.ENGLISH_STOP_WORDS);

*boolean* recreateIndexIfExists = *true*;

IndexWriter indexWriter = *new* IndexWriter(*INDEX_DIRECTORY*, analyzer,
recreateIndexIfExists);

File dir = *new* File(*FILES_TO_INDEX_DIRECTORY*);

File[] files = dir.listFiles();

*for* (File file : files) {

Document document = *new* Document();

String path = file.getCanonicalPath();

document.add(*new* Field(*FIELD_PATH*, path, Field.Store.*YES*, Field.Index.
UN_TOKENIZED,Field.TermVector.*YES*));

Reader reader = *new* FileReader(file);

document.add(*new* Field(*FIELD_CONTENTS*, reader));

indexWriter.addDocument(document);

 }

indexWriter.optimize();

indexWriter.close();

}

*public* *static* *void* searchIndex(String searchString)
*throws*IOException, ParseException {

System.*out*.println("Searching for '" + searchString + "'");

Directory directory = FSDirectory.getDirectory(*INDEX_DIRECTORY*);

IndexReader indexReader = IndexReader.open(directory);

IndexSearcher indexSearcher = *new* IndexSearcher(indexReader);

 SnowballAnalyzer analyzer = *new* SnowballAnalyzer( "English",
StopAnalyzer.ENGLISH_STOP_WORDS);

QueryParser queryParser = *new* QueryParser(*FIELD_CONTENTS*, analyzer);

Query query = queryParser.parse(searchString);

Hits hits = indexSearcher.search(query);

System.*out*.println("Number of hits: " + hits.length());

TopDocs results = indexSearcher.search(query,10);

ScoreDoc[] hits1 = results.scoreDocs;

*for* (ScoreDoc hit : hits1) {

Document doc = indexSearcher.doc(hit.doc);

//System.out.printf("%5.3f %s\n",hit.score,doc.get(FIELD_CONTENTS));

System.*out*.println(hit.score);

Searcher.explain("rice",0);

}

 Iterator<Hit> it = hits.iterator();

*while* (it.hasNext()) {

Hit hit = it.next();

Document document = hit.getDocument();

String path = document.get(*FIELD_PATH*);

System.*out*.println("Hit: " + path);

}

}

}


On Mon, Jul 5, 2010 at 7:46 PM, Grant Ingersoll <gsingers@apache.org> wrote:

>
> On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote:
>
> > Hi,
> >
> > In my application, I input only single term query (at one time) and get
> back
> > the corresponding scorings for those queries. But I am little struggling
> of
> > understanding Lucene scoring. I have reffered
> >
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
> > and
> > some other pages to resolve my matters. But some are still remain.
> >
> > 1) Why it has taken the squareroot of frequency as the tf value and
> square
> > of the idf vale in score function?
>
> Somewhat arbitrary, I suppose, but I think someone way back did some tests
> and decided it performed "best" in general.  More importantly, the point of
> the Similarity class is you can override these if you desire.
>
> >
> > 2) If I enter single term query, then what will return bythe coord(q,d)?
> > Since there are always one term in the query, I think always it should be
> 1!
> > Am I correct?
>
> Should be.  You can run the explain() method to confirm.
>
> >
> > 3) I am also struggling understanding sumOfSquaredWeights (in
> queryNorm(q)).
> > As I can understand, this value depends on the nature of the query we
> input
> > and depends on that, it uses different methods such as TermQuery,
> > MultiTermQuery, BooleanQuery, WildcardQuery, PhraseQuery, PrefixQuery,
> etc.
> > But if I always use single term query, then what will be the way selected
> by
> > the system from above?
>
> The queryNorm is an attempt at making scores comparable across queries.
>  Again, I'd try the explain() method to see the practical aspects of how it
> effects score.
>
> See http://lucene.apache.org/java/2_4_0/scoring.html for more info on
> scoring.
>
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message