Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 77835 invoked from network); 27 Aug 2008 16:56:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Aug 2008 16:56:18 -0000 Received: (qmail 45469 invoked by uid 500); 27 Aug 2008 16:56:14 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 45418 invoked by uid 500); 27 Aug 2008 16:56:13 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 45409 invoked by uid 99); 27 Aug 2008 16:56:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2008 09:56:13 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [69.44.16.11] (HELO getopt.org) (69.44.16.11) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2008 16:55:14 +0000 Received: from [192.168.0.220] ([81.219.54.251]) (authenticated) by getopt.org (8.11.6/8.11.6) with ESMTP id m7RGtsZ12886 for ; Wed, 27 Aug 2008 11:55:54 -0500 Message-ID: <48B586FF.9060001@getopt.org> Date: Wed, 27 Aug 2008 18:55:27 +0200 From: Andrzej Bialecki User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: Analyzer and Fieldable, different stored and indexed values References: <48B56902.2060503@getopt.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Grant Ingersoll wrote: > If I'm understanding correctly... > > What about a SinkTokenizer that is backed by a Reader/Field instead of > the current one that stores it all in a List? This is more or less the > use case for the Tee/Sink implementations, w/ the exception that we > didn't plan for the Sink being too large, but that is easily overcome, IMO. > > That is, you use a TeeTokenFilter that adds to your Sink, which > serializes to some storage, and then your SinkTokenizer just > unserializes. No need to change Fieldable at all or anything else > > Or maybe just a Tokenizer that is backed by a Field would work and uses > a TermEnum on the Field to serve up next() for the TokenStream. > > Just thinking out loud... Actually, the scenario is more complicated, because I need to implement this as a Solr FieldType ... besides, wouldn't this mean that I can't store the original value, because I'm setting the tokenStream on a Field (which automatically makes it un-stored)? Anyway, thanks for the hint, I'll check if I can do it this way. Other points about the new Analyzer API - I still think it would offer more flexibility than the current API, for a minimal cost in compatibility, and likely no cost in performance. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org