lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [jira] Commented: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter
Date Thu, 04 Sep 2008 02:36:35 GMT
Or just remove the generics, right?

On Sep 3, 2008, at 5:09 PM, Karl Wettin (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628132

> #action_12628132 ]
>
> Karl Wettin commented on LUCENE-1320:
> -------------------------------------
>
> OK. Either remove it or place it in some alternative contrib module?  
> The first chooise is obviously the easiest.
>
>> ShingleMatrixFilter, a three dimensional permutating shingle filter
>> -------------------------------------------------------------------
>>
>>                Key: LUCENE-1320
>>                URL: https://issues.apache.org/jira/browse/LUCENE-1320
>>            Project: Lucene - Java
>>         Issue Type: New Feature
>>         Components: contrib/analyzers
>>   Affects Versions: 2.3.2
>>           Reporter: Karl Wettin
>>           Assignee: Karl Wettin
>>           Priority: Blocker
>>            Fix For: 2.4
>>
>>        Attachments: LUCENE-1320.txt, LUCENE-1320.txt, LUCENE-1320.txt
>>
>>
>> Backed by a column focused matrix that creates all permutations of  
>> shingle tokens in three dimensions. I.e. it handles multi token  
>> synonyms.
>> Could for instance in some cases be used to replaces 0-slop phrase  
>> queries with something speedier.
>> {code:java}
>> Token[][][]{
>>  {{hello}, {greetings, and, salutations}},
>>  {{world}, {earth}, {tellus}}
>> }
>> {code}
>> passes the following test  with 2-3 grams:
>> {code:java}
>> assertNext(ts, "hello_world");
>> assertNext(ts, "greetings_and");
>> assertNext(ts, "greetings_and_salutations");
>> assertNext(ts, "and_salutations");
>> assertNext(ts, "and_salutations_world");
>> assertNext(ts, "salutations_world");
>> assertNext(ts, "hello_earth");
>> assertNext(ts, "and_salutations_earth");
>> assertNext(ts, "salutations_earth");
>> assertNext(ts, "hello_tellus");
>> assertNext(ts, "and_salutations_tellus");
>> assertNext(ts, "salutations_tellus");
>> {code}
>> Contains more and less complex tests that demonstrate offsets,  
>> posincr, payload boosts calculation and construction of a matrix  
>> from a token stream.
>> The matrix attempts to hog as little memory as possible by seeking  
>> no more than maximumShingleSize columns forward in the stream and  
>> clearing up unused resources (columns and unique token sets). Can  
>> still be optimized quite a bit though.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message