lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: conditional High Freq Terms in Lucene index
Date Thu, 29 Mar 2012 17:42:54 GMT
You'd have to modify HighFreqTerm's sources...

Roughly...

First, make a bitset recording which docs are type A (eg, use
FieldCache), second, change HighFreqTerms so that for each term, it
walks the postings, counting how many type A docs there were, then...
just use the rest of HighFreqTerms (priority queue, etc.).

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 29, 2012 at 11:33 AM, starz10de <farag_ahmed@yahoo.com> wrote:
> HI,
>
> I am using HighFreqTerms class to compute the high frequent terms in the
> Lucene index and it works well. However, I am interested to compute the high
> frequent terms under some condition. I would like to compute the high
> frequent terms not for all documents in the index instead only for documents
> with type “A”. Beside the “contents” field in the index I have also the
> “DocType” (document type) in the index as extra field.
> So I should compute the high frequent term only  (if DocType=”A”)
>
> Any idea how to do this?
>
> Thanks
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message