lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nader Akhnoukh" <>
Subject Re: Phrase Frequency For Analysis
Date Thu, 22 Jun 2006 17:25:59 GMT
Yes, Chris is correct, the goal is to determine the most frequently occuring
phrases in a document compared to the frequency of that phrase in the
index.  So there are only output phrases, no inputs.

Also performance is not really an issue, this would take place on an
irregular basis and could run overnight if need be.

So it sounds like the best approach would be to index all 1, 2, and 3 word
phrases.  Does anyone know of an Analyzer that does this?  And if I can
successfully index the phrases would the term frequency vector contain all
the combination of phrases as terms along with their frequencies?

Andrzej,  can you discuss your approach in a little more detail.  Are you
suggesting manually traversing each document and doing a search on each
phrase?  That seems very intensive as I have tens of thousands of documents.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message