lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Mintern <>
Subject Re: Preserving TokenFilters
Date Mon, 12 Mar 2012 16:51:36 GMT
Everything that we've read seems to indicate that heavy Lucene users
inevitably write their own Filter streams. We just did this ourselves
a month or two ago, and it really wasn't too bad. Just make sure that
you reference the latest Lucene release when you're writing your own
filter. There's a splitting filter that could serve as a good
reference if you need to emit multiple tokens at the same position.

We referred to "Lucene in Action" (version 2) when writing it. While
helpful, it was a bit out of date. Just make sure that whatever
reference you use (either source code or a howto) is up to date.

On Mon, Mar 12, 2012 at 9:47 AM, Alan Woodward
<> wrote:
> Hello,
> I have a number of operations that I want to apply to a TokenStream, supplementing the
original tokens with modified forms.  For example, I want to reverse tokens, to allow prefix
wildcard queries, and I want to index both lowercased and original terms.
> I initially tried to wrap ReverseStringFilter and LowerCaseFilter with a generic 'preserve
original token' filter, but this doesn't work, as TokenFilter chaining works by pulling tokens
from parents, and I somehow need to push them into children.  So I tried subclassing the
filters instead, but of course they're both final…
> Is there already some way of doing this that I'm missing?  Or will I just have to copy'n'paste
RSFilter and LCFilter to my own package, and add the preserving logic myself?
> (I'm aware that there's a Solr filter, ReversedWildcardFilter, that will do part of this
for me, but I was hoping to only use lucene classes).
> Thanks,
> Alan Woodward
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message