lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shay Hummel <shay.hum...@gmail.com>
Subject Re: Tf and Df in lucene
Date Tue, 16 Jun 2015 03:28:54 GMT
Erick and Ahmet - thank you

Shay

On Mon, Jun 15, 2015 at 6:19 PM Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> Hi,
>
> If you are interested in summed up tf values of multiple terms,
> I suggest to extend SimilarityBase class to return raw freq as score.
>
> float score(BasicStats stats, float freq, float docLen){
> return freq;
> }
>
> When you use this similarity, search for three term query, scores will
> summed tf values. Also you can extract additional info from explain feature.
>
> Ahmet
>
>
>
>
> On Monday, June 15, 2015 5:50 PM, Shay Hummel <shay.hummel@gmail.com>
> wrote:
> Hi Ahmet
>
> Thank you for the reply.
> Can the term reflect a multi word expression?
> For example:
> I want to find the term frequency \ document frequency of "united states"
> (two terms) or "free speech zones" (three terms).
>
> Shay
>
>
> On Mon, Jun 15, 2015 at 4:55 PM Ahmet Arslan <iorixxx@yahoo.com.invalid>
> wrote:
>
> > Hi Hummel,
> >
> > regarding df,
> >
> > Term term = new Term(field, word);
> > TermStatistics termStatistics = searcher.termStatistics(term,
> > TermContext.build(reader.getContext(), term));
> > System.out.println(query + "\t totalTermFreq \t " +
> > termStatistics.totalTermFreq());
> > System.out.println(query + "\t docFreq \t " + termStatistics.docFreq());
> >
> > regarding tf,
> >
> > Term term = new Term(field, word);
> > Bits bits = MultiFields.getLiveDocs(reader);
> > PostingsEnum postingsEnum = MultiFields.getTermDocsEnum(reader, bits,
> > field, term.bytes());
> >
> > if (postingsEnum == null) return;
> >
> > int max = 0;
> > while (postingsEnum.nextDoc() != PostingsEnum.NO_MORE_DOCS) {
> > final int freq = postingsEnum.freq();
> > int docID = postingsEnum.docID();}
> >
> >
> > Ahmet
> >
> >
> >
> >
> > On Monday, June 15, 2015 9:12 AM, Shay Hummel <shay.hummel@gmail.com>
> > wrote:
> > Hi
> >
> > I was wondering, what is the easiest way to get the term frequency of a
> > term t in document d, namely tf(t,d) ?
> > In the same spirit - what is the easieast way the get the document
> > frequency of a term in the collection, i.e. how many contain the term t,
> > namely df(t) ?
> >
> > Regards,
> > Shay
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message