lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yannick Martel <mar...@codelutin.com>
Subject Top terms relevance from specific documents ?
Date Tue, 15 Dec 2015 16:51:18 GMT
Hi !

I am using (Java) Lucene for data indexation, and I want to produce kind
of tags cloud for specific data.

I've found HighFreqTerms to get a top list of terms from *all
documents* (if I have well understood) (by the bye, I had override it to
be able to filter on several fields instead only one).

But, it does not really match with my need : I'd like to get the most
repeated terms in a single (or several specific) document(s).
For exemple, considering a document with Terms "Title", "Summary",
"Description", I try to get the count of each terms (excluding stop
words from Analyzer).

I cannot find process to do that : I searched among TopFieldCollector,
or other collector, but seems it just give document scores :/

Find documentation is not easy I think, cause lot of questions/answers
are either not corresponding my need, or with old version (3.x for
example), and I'm feeling lost in all of this...


Hopping someone could guide me well.

Regards,

-- 
Yannick Martel


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message