lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <>
Subject Re: lucene for statistical analysis
Date Fri, 02 May 2003 10:03:35 GMT
Creating an index for Lucene is indeed a good idea ;-)

It's very easy to retrieve informations about the most frequent Terms in the
index and the frequency of a given Term.
(e.g. using  IndexReader.termDocs(Term term))

But there's currently no method in the API to get the frequency of a
PhraseQuery. There was a discussion about that particular point a long time
ago (see
This is also in the list of future improvments

I implemented it, but in a old version of Lucene. Because of the
modifications made in the Scoring recently it has  to be redone. The problem
is that computing the frequency of a PhraseQuery takes a lot of time (in a
regular search as well).

If you don't need frequencies for PhraseQueries - Lucene is a good
solution.Otherwise changes must be done in Lucene.

I'll try to take a look at it soon and propose a patch to the core sources.

----- Original Message -----
From: "Andy Nauli" <>
To: <>
Sent: Friday, May 02, 2003 11:33 AM
Subject: lucene for statistical analysis

> hello,
> I am just starting looking at lucene for my project.
> Before I proceed, I would like to know if it's a good idea to use lucene
> creating index and also performing statistical analysis on the index (e.g.
> most frequent words, number of appearance of certain index token, etc.)
> if lucene is not a good candidate, can anyone suggest an alternatives ?
> thanks in advance
> andy
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message