Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <48B586FF.9060001@getopt.org>
Date: Wed, 27 Aug 2008 18:55:27 +0200
From: Andrzej Bialecki <ab@getopt.org>
User-Agent: Thunderbird 2.0.0.16 (Windows/20080708)
MIME-Version: 1.0
To: java-dev@lucene.apache.org
Subject: Re: Analyzer and Fieldable, different stored and indexed values
References: <48B56902.2060503@getopt.org>
 <C1307C00-D4CE-4914-B93B-FEFC784DBBA0@apache.org>
In-Reply-To: <C1307C00-D4CE-4914-B93B-FEFC784DBBA0@apache.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Grant Ingersoll wrote:
> If I'm understanding correctly...
> 
> What about a SinkTokenizer that is backed by a Reader/Field instead of 
> the current one that stores it all in a List?  This is more or less the 
> use case for the Tee/Sink implementations, w/ the exception that we 
> didn't plan for the Sink being too large, but that is easily overcome, IMO.
> 
> That is, you use a TeeTokenFilter that adds to your Sink, which 
> serializes to some storage, and then your SinkTokenizer just 
> unserializes.  No need to change Fieldable at all or anything else
> 
> Or maybe just a Tokenizer that is backed by a Field would work and uses 
> a TermEnum on the Field to serve up next() for the TokenStream.
> 
> Just thinking out loud...

Actually, the scenario is more complicated, because I need to implement 
this as a Solr FieldType ... besides, wouldn't this mean that I can't 
store the original value, because I'm setting the tokenStream on a Field 
(which automatically makes it un-stored)?

Anyway, thanks for the hint, I'll check if I can do it this way. Other 
points about the new Analyzer API - I still think it would offer more 
flexibility than the current API, for a minimal cost in compatibility, 
and likely no cost in performance.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org