lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: Total Freq for Bigrams, Trigrams, etc.
Date Tue, 02 Dec 2014 22:31:18 GMT
If you index the n-grams in their own field using ShingleFilter, you can 
get statistics using the same term api on that field, in which the terms 
*are* n-grams, and similarly for queries.

-Mike

On 12/02/2014 03:38 PM, Peter Organisciak wrote:
> It is possible to get a total corpus frequency for bigram queries or
> higher? i.e. How many times does the query occur in the corpus.
>
> I'm looking to implement a count of occurrences per million terms. I know
> for a single term I can use  `TermsEnum.totalTermFreq()`, is there any
> comparable way to do so for a bigram or other simple query?
>
> Thank you,
>
> Peter
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message