lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: getting frequency of a phrase within documents
Date Tue, 11 Apr 2006 17:54:05 GMT

if you use a custom SImilarity class, the tf(float) function is used for
phrases to determine how the score should be determined based on the
number of times the phrase qppears in the documents.

if you make it an identity function, and modify the other functions in the
Similarity to be (mostly) constant values, then youcan peobably make a
Similarity class in which the score of each document is the number of time
the Phrase appears.

NOTE: this will really only work with exact phrases ... for inexact phrase
matches the value passed to tf is a sum that has already lost information
... the input might be "1.5" but you have no way of knowing if that's two
sloppy maches with a "freq" of 0.75" each, or 3 sloppy matches with of
"0.5" each.

You might be better off using the SpanNear class and the getSpans method.



: Date: Tue, 11 Apr 2006 11:58:20 -0400
: From: Vishal Bathija <vishalbathija@gmail.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: getting frequency of a phrase within documents
:
: Hi,
: I am using phraseQuery to get the number of documents that the query
: appers in using the hits. I would like to know if there is any way in
: which i can get the number of times a phrase appears within each
: document.
:
: I am currently  using  searching for the phrase "avoids deadlock"
:
: phraseQuery query =new PhraseQuery();
: searcher = new IndexSearcher(rd);
: String temp ="avoids";
: String temp2 ="deadlock";
: Term synset2 = new Term("contents",temp);
: Term ss = new Term("contents",temp2);
: query.add(synset2);
: query.add( ss);
: Hits hits = searcher.search(query);
: System.out.println("number of hits="+((HitIterator)hits.iterator()).length() );
:
: Any help would be greatly appreciated.
:
:
:
:
: --
: Vishal Bathija
: Graduate Student
: Department of Computer Science & Systems Analysis
: Miami University
: Oxford,Ohio
: Phone: (513)-461-9239
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message