lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DM Smith (JIRA)" <>
Subject [jira] Commented: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
Date Mon, 17 Aug 2009 14:59:15 GMT


DM Smith commented on LUCENE-1813:

I like the idea of a constant and it presented as a default. I suggest that others be given
in the JavaDoc.

I have some texts which are using PUAs until Unicode includes the code points (e.g. Myanmar
text), so I'm glad that allowing a choice doesn't create a potential conflict there. I think
PUA should be left to the text author.

As my texts are all derived from XML, I like the use of a character that is not allowed in
XML. I think 0001 is just fine, even if not from a purity perspective.

Some of my texts have BIDI markers and while these will be stripped by filters, I don't think
this use is analogous.

> Add option to ReverseStringFilter to mark reversed tokens
> ---------------------------------------------------------
>                 Key: LUCENE-1813
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Andrzej Bialecki 
>            Assignee: Robert Muir
>             Fix For: 2.9
>         Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch
> This patch implements additional functionality in the filter to "mark" reversed tokens
with a special marker character (Unicode 0001). This is useful when indexing both straight
and reversed tokens (e.g. to implement efficient leading wildcards search).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message