lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter
Date Wed, 03 Sep 2008 21:09:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628132#action_12628132
] 

Karl Wettin commented on LUCENE-1320:
-------------------------------------

OK. Either remove it or place it in some alternative contrib module? The first chooise is
obviously the easiest.

> ShingleMatrixFilter, a three dimensional permutating shingle filter
> -------------------------------------------------------------------
>
>                 Key: LUCENE-1320
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1320
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>    Affects Versions: 2.3.2
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>            Priority: Blocker
>             Fix For: 2.4
>
>         Attachments: LUCENE-1320.txt, LUCENE-1320.txt, LUCENE-1320.txt
>
>
> Backed by a column focused matrix that creates all permutations of shingle tokens in
three dimensions. I.e. it handles multi token synonyms.
> Could for instance in some cases be used to replaces 0-slop phrase queries with something
speedier.
> {code:java}
> Token[][][]{
>   {{hello}, {greetings, and, salutations}},
>   {{world}, {earth}, {tellus}}
> }
> {code}
> passes the following test  with 2-3 grams:
> {code:java}
> assertNext(ts, "hello_world");
> assertNext(ts, "greetings_and");
> assertNext(ts, "greetings_and_salutations");
> assertNext(ts, "and_salutations");
> assertNext(ts, "and_salutations_world");
> assertNext(ts, "salutations_world");
> assertNext(ts, "hello_earth");
> assertNext(ts, "and_salutations_earth");
> assertNext(ts, "salutations_earth");
> assertNext(ts, "hello_tellus");
> assertNext(ts, "and_salutations_tellus");
> assertNext(ts, "salutations_tellus");
> {code}
> Contains more and less complex tests that demonstrate offsets, posincr, payload boosts
calculation and construction of a matrix from a token stream.
> The matrix attempts to hog as little memory as possible by seeking no more than maximumShingleSize
columns forward in the stream and clearing up unused resources (columns and unique token sets).
Can still be optimized quite a bit though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message