lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shouvik Bardhan <sbard...@gisfederal.com>
Subject Re: High frequency terms in results document....
Date Thu, 19 Feb 2015 13:44:08 GMT
Thanks for your input Uchida. I will try that out. I wonder what is the
magic sauce in Luke's set of calls which allows it to create say top 100
terms even from a index with 100 million docs (small docs though for me).
Looks like it goes thru every term and puts them in a priority queue and
takes the top N.

regards.

On Thu, Feb 19, 2015 at 2:10 AM, Tomoko Uchida <tomoko.uchida.1111@gmail.com
> wrote:

> Hi,
>
> I'm afraid there are no easy or straight way for your requirement.
> I would try create an temporary tiny index from search results on the fly
> in memory, and get top N terms from it by HighFreqTerms.
>
> http://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/misc/HighFreqTerms.html
> (The logic is almost same to Luke's top N terms feature)
>
> I have not tried ant not sure about this is practical approach in
> performance, just an idea...
>
> Hope for it's help
> Tomoko
>
> 2015-02-16 1:58 GMT+09:00 Shouvik Bardhan <sbardhan@gisfederal.com>:
>
> > Apologies if I have missed it in discussions prior but I looked all
> over. I
> > looked at the Luke code and it does find high frequency terms on the
> entire
> > index. I am trying to get the top N high frequency terms in the documents
> > returned from a search result. I came across something called
> > FilterIndexReader but I don't think it is part of 4.X codebase. Any
> pointer
> > is appreciated.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message