Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 39650 invoked from network); 20 Apr 2004 15:26:58 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 20 Apr 2004 15:26:58 -0000 Received: (qmail 77738 invoked by uid 500); 20 Apr 2004 15:26:50 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 77725 invoked by uid 500); 20 Apr 2004 15:26:49 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 77710 invoked from network); 20 Apr 2004 15:26:49 -0000 Received: from unknown (HELO gwia201.syr.edu) (128.230.248.25) by daedalus.apache.org with SMTP; 20 Apr 2004 15:26:49 -0000 Received: from MTA2-MTA by gwia201.syr.edu with Novell_GroupWise; Tue, 20 Apr 2004 11:26:51 -0400 Message-Id: X-Mailer: Novell GroupWise Internet Agent 6.0.4 Date: Tue, 20 Apr 2004 11:26:38 -0400 From: "Grant Ingersoll" To: Subject: Re: incorrect OO in lucene source? Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N The thread safety issues are on the search side usage of Analyzer, not = indexing. >>> tdv@part.net 04/20/04 10:52AM >>> Grant Ingersoll wrote: >I agree with Robert, as I have had similar wishes about more interface = capabilities, but also agree with Eric in that Lucene works great in a lot = of ways. I have found the current design causes you to have to hard = code things that shouldn't need to be hard coded, especially in the = TokenStream area. The idea of writing a new Analyzer every time you want = to change a Tokenizer or TokenFilter is very limiting. In my application = I need the flexibility to re-index and evaluate fairly often. The current = Analyzer implementation would require me to write a new Analyzer for every = experiment and that is not manageable. Do others have this issue? > > =20 > I had this issue. I have solved this by rewritting the API around=20 TokenStream (mainly introducing an interface that allows resetting the=20 source stream) and creating a generalized analyzer class. This analyzer=20 class holds a reference to the TokenStream pipeline to which it=20 delegates. A PerField analyzer is populated with Analyzers configured=20 from JNDI (essentially Tokenizer and TokenStreamDecorator compositions).=20= When TokenStream(String fieldName, Reader reader) is called the analyzer=20= resets its TokenStream reference before returning it. >I submitted a "broken" patch that converts the analyzers and token = streams to interfaces, but as Doug pointed out, it is not currently thread = safe (I have another version that uses reflection that is thread safe). I = intend to go back and make it thread-safe, but haven't had the time. = Anyway, this patch contains an interface implementation of Analyzer and = TokenStream that we may find useful in the future and if someone else = wants to take up the ball and make it thread-safe, I don't think it would = take too long. > =20 > What were the issues related to thread safety? Are invocations of an=20 analyzer within an IndexWriter not single threaded? I was unsure of=20 this, but planned to object pool my TokenStream compositions if needed. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org=20 For additional commands, e-mail: lucene-dev-help@jakarta.apache.org=20 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org