Creating an index for Lucene is indeed a good idea ;-)
It's very easy to retrieve informations about the most frequent Terms in the
index and the frequency of a given Term.
(e.g. using IndexReader.termDocs(Term term))
But there's currently no method in the API to get the frequency of a
PhraseQuery. There was a discussion about that particular point a long time
ago (see
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00101.html).
This is also in the list of future improvments
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=18932.
I implemented it, but in a old version of Lucene. Because of the
modifications made in the Scoring recently it has to be redone. The problem
is that computing the frequency of a PhraseQuery takes a lot of time (in a
regular search as well).
If you don't need frequencies for PhraseQueries - Lucene is a good
solution.Otherwise changes must be done in Lucene.
I'll try to take a look at it soon and propose a patch to the core sources.
----- Original Message -----
From: "Andy Nauli" <andy.nauli@utoronto.ca>
To: <lucene-user@jakarta.apache.org>
Sent: Friday, May 02, 2003 11:33 AM
Subject: lucene for statistical analysis
> hello,
>
> I am just starting looking at lucene for my project.
>
> Before I proceed, I would like to know if it's a good idea to use lucene
for
> creating index and also performing statistical analysis on the index (e.g.
> most frequent words, number of appearance of certain index token, etc.)
>
> if lucene is not a good candidate, can anyone suggest an alternatives ?
>
> thanks in advance
> andy
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|