lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Term/Phrase frequencies
Date Fri, 07 May 2010 14:52:59 GMT
Well, counting frequency isn't the best approach. For instance, if a field
has 1,000 terms and 10 occurrences of your target, is that a better match
than a field with 10 terms and 5 occurrences of your target?

This kind of thing is already taken into account with Lucene scoring, you
might
want to look at this:
http://lucene.apache.org/java/2_4_0/scoring.html

<http://lucene.apache.org/java/2_4_0/scoring.html>Best
Erick

On Thu, May 6, 2010 at 11:48 PM, manjula wijewickrema
<manjula53@gmail.com>wrote:

> Hi Erik,
>
> Thanks for the reply. What I want to do is, to identify key terms and key
> phrases of a document according to their number of occurences in the
> document. Output should be the highest freequency words and (two or three
> word) phrases. For this purpose can I use Lucene?
>
> Thanks
> Manjula
>
> On Thu, May 6, 2010 at 6:09 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > Terms are relatively easy, see TermFreqVector in the JavaDocs.
> >
> > Phrases aren't as easy, before you go there, though, what is the
> > high-level problem you're trying to solve? Possibly this is an XY problem
> > (see http://people.apache.org/~hossman/#xyproblem).
> >
> > Best
> > Erick
> >
> > On Thu, May 6, 2010 at 6:39 AM, manjula wijewickrema <
> manjula53@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I am new to Lucene. If I want to know the term or phrase frequency of
> an
> > > input document, will it be possible through Lucene?
> > >
> > > Thanks,
> > > Manjula
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message