lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sengly Heng" <sengly.h...@gmail.com>
Subject Re: TF-IDF API
Date Wed, 28 Mar 2007 12:20:06 GMT
Thanks but in my case I do not have the files. What I have is just a
collection of vectors of terms.

Does lucene provide any mean to index each vector of terms as a file? Or
there is a better way to do?

Thank everyone once again.

Regards,

Sengly


On 3/28/07, thomas arni <art@zhwin.ch> wrote:
>
> Hava a look at the "TermDocs" Interface in the API.
>
> You can get term frequency  with a open IndexReader
>
> TermDocs termDocs = reader.termDocs(term);
>
> where "term" represents the current Term.
>
> now you can call:
>
> termDocs.freq()
>
> to get the frequency of the term within the current document.
>
> For the calculation of the idf, you can use the provided formula from
> the "DefaultSimilarity".
> To get the document frequency, which is necessary to calculate the idf,
> you can call:
>
> reader.docFreq(term)
>
> Hope this helps...
>
> Thomas
>
>
> Sengly Heng wrote:
> > Hello Luceners,
> >
> > I have a collections of vector of terms (token) that I extracted from
> > files.
> > I am looking for ways to calculate TF/IDF of each term.
> >
> > I wanted to use Lucene to do this but Lucene is made for collections of
> > files and in my case I have already extracted those files into vector of
> > terms. I know it is not very difficult to implement this measurement
> > but I
> > guess there should be such API available. Does anyone of you know any
> > Java
> > API that directly handle this problem? or I have to implement from
> > scratch.
> >
> > Any idea would be highly appreciated.
> >
> > Thank you in advance.
> >
> > Best regards,
> >
> > Sengly
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message