lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] [Updated] (LUCENE-2749) Co-occurrence filter
Date Thu, 09 May 2013 23:06:01 GMT


Uwe Schindler updated LUCENE-2749:

    Fix Version/s:     (was: 4.3)
> Co-occurrence filter
> --------------------
>                 Key: LUCENE-2749
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 3.1, 4.0-ALPHA
>            Reporter: Steve Rowe
>            Priority: Minor
>             Fix For: 4.4
> The co-occurrence filter to be developed here will output sets of tokens that co-occur
within a given window onto a token stream.  
> These token sets can be ordered either lexically (to allow order-independent matching/counting)
or positionally (e.g. sliding windows of positionally ordered co-occurring terms that include
all terms in the window are called n-grams or shingles). 
> The parameters to this filter will be: 
> * window size: this can be a fixed sequence length, sentence/paragraph context (these
will require sentence/paragraph segmentation, which is not in Lucene yet), or over the entire
token stream (full field width)
> * minimum number of co-occurring terms: >= 2
> * maximum number of co-occurring terms: <= window size
> * token set ordering (lexical or positional)
> One use case for co-occurring token sets is as candidates for collocations.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message