lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio Calò <anton.c...@gmail.com>
Subject Re: Frequency Term of Composite words
Date Thu, 17 Dec 2009 10:54:54 GMT
Hi Ted.

Thank you very much for your feedback.

I can see the term frequency for each term, but not fo couples or more term
togheter.

An example: "the quick brown fox jumps over the lazy dog. But the big dog
was sleeping.So The lazy dog didn't see the fox"

So, with your suggestion I'm able to find that tf("dog") = 2,
tf("fox")=3,... (the terms are composed by  just a word).

But it seems that TermFrequencyVector cannot answer to this: tf("lazy
dog")=2, tf("quick brown")=1.

Unlikely I've been asked to retrieve the occurrence of a set of concept in a
document and I was trying to use lucene cause my simple mapping algorithm is
too slow :(.

I'll try to see if I can do something with TermFreqVector, or with the
Analizer. OR I'll go to look for another way :)

Antonio



2009/12/16 Ted Dunning <ted.dunning@gmail.com>

> You need the term frequency vector.
>
> See here
>
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/IndexReader.html#getTermFreqVector%28int,%20java.lang.String%29
>
> This is compatible in 3.0 as well:
>
> http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/index/IndexReader.html#getTermFreqVector%28int,%20java.lang.String%29
>
> Note the package change.
>
>
> On Wed, Dec 16, 2009 at 7:34 AM, Antonio Calò <anton.calo@gmail.com>
> wrote:
>
> > I All
> >
> > I Hope that you can help me on this.
> >
> > I'm looking for a fast way to obtainf for a given word, its term
> frequency
> > (I mean how many times it is available in a single doc). I've looking
> into
> > mail archive and LIA (Lucene In Action) book and I found something like
> > this:
> >
> > IndexSearcher index = new IndexSearcher(invertedIndexinRam);
> > Term term = new Term("doc", "quick");
> > int occurrence = index.docFreq(term);
> >
> > ok, occurrence contains the occurrences of the word "quick" into the
> index
> > (In my case the index will contain only one document example "the quick
> > brown fox jumps over the lazy dog"). In this case the occurrence will be
> 1.
> > :)
> >
> > But now I need to retrieve the occurrency of a composite word: as example
> > "quick brown fox" but I'm quite in trouble on how could I perform this.
> >
> > Thanks in advance for your help.
> >
> > Best Regards.
> >
> > Antonio
> >
> >
> >
> > --
> > Antonio Calò
> > ------------------------------------------
> > Software Developer Engineer
> > @ Intellisemantic
> > Mail anton.calo@gmail.com
> > Tel. 011-56.90.429
> > ------------------------------------------
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>



-- 
Antonio Calò
------------------------------------------
Software Developer Engineer
@ Intellisemantic
Mail anton.calo@gmail.com
Tel. 011-56.90.429
------------------------------------------

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message