lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Re: Prefix and general wildcards
Date Sat, 10 Jun 2006 17:50:53 GMT
Doug Cutting wrote on 06/09/2006 08:00 AM:
> Chuck Williams wrote:
>> one simple and substantial optimization is
>> to support a token filter for term vectors, i.e. pass tokens through an
>> additional filter for addition to term vectors.
>
> Why not instead add the rotated and/or reversed tokens to a different
> field that does not store vectors?
>
I'm running into issues with the separate field approach.  This would
seem to require either rereading the content or storing all of the
reversed/rotated tokens for subsequent generation out of a data
structure.  Both of these are performance problems, and in my app
rereading is not even practical.  Some fields are entire large
documents; requirements prohibit any truncation.  The content is
streamed to the indexer through soap, whence the additional rereading
problems.

It seems easiest and most efficient to have an additional filter on the
tokens that go into a term vector.  Am I missing an easier way to set up
a separate field?

I understand the desire to not add facilities to Lucene when there is an
existing method to achieve the same end, but it is not clear than using
an additional field is a practical approach.  It also seems that in
general the tokens useful in a term vector are only a subset of those
useful in the index -- at least this is the case for my app.

Thanks for any guidance,

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message