Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: error (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAHpHujkC2P2u-dnaOKcU3YaLjne7tVypsCp+BkDOXZshTc9NJQ@mail.gmail.com>
References: 
 <CAC5D3q70AmujEZpX82fGm1uBdmjfn7=GY4JNf9h4BJmOKsZDAw@mail.gmail.com>
	<CAHpHujkC2P2u-dnaOKcU3YaLjne7tVypsCp+BkDOXZshTc9NJQ@mail.gmail.com>
Date: Thu, 19 Feb 2015 08:44:08 -0500
Message-ID: 
 <CAC5D3q4f2va_fz-2HRQ_qnAuzbFM9YhnPd1hdFdQEDAioVx8LA@mail.gmail.com>
Subject: Re: High frequency terms in results document....
From: Shouvik Bardhan <sbardhan@gisfederal.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=089e0118450a570b22050f7121dc

--089e0118450a570b22050f7121dc
Content-Type: text/plain; charset=UTF-8

Thanks for your input Uchida. I will try that out. I wonder what is the
magic sauce in Luke's set of calls which allows it to create say top 100
terms even from a index with 100 million docs (small docs though for me).
Looks like it goes thru every term and puts them in a priority queue and
takes the top N.

regards.

On Thu, Feb 19, 2015 at 2:10 AM, Tomoko Uchida <tomoko.uchida.1111@gmail.com
> wrote:

> Hi,
>
> I'm afraid there are no easy or straight way for your requirement.
> I would try create an temporary tiny index from search results on the fly
> in memory, and get top N terms from it by HighFreqTerms.
>
> http://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/misc/HighFreqTerms.html
> (The logic is almost same to Luke's top N terms feature)
>
> I have not tried ant not sure about this is practical approach in
> performance, just an idea...
>
> Hope for it's help
> Tomoko
>
> 2015-02-16 1:58 GMT+09:00 Shouvik Bardhan <sbardhan@gisfederal.com>:
>
> > Apologies if I have missed it in discussions prior but I looked all
> over. I
> > looked at the Luke code and it does find high frequency terms on the
> entire
> > index. I am trying to get the top N high frequency terms in the documents
> > returned from a search result. I came across something called
> > FilterIndexReader but I don't think it is part of 4.X codebase. Any
> pointer
> > is appreciated.
> >
>

--089e0118450a570b22050f7121dc--