Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 72184 invoked from network); 10 Sep 2009 22:28:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Sep 2009 22:28:24 -0000 Received: (qmail 11769 invoked by uid 500); 10 Sep 2009 22:28:23 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 11705 invoked by uid 500); 10 Sep 2009 22:28:23 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 11697 invoked by uid 99); 10 Sep 2009 22:28:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 22:28:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.221.171 as permitted sender) Received: from [209.85.221.171] (HELO mail-qy0-f171.google.com) (209.85.221.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 22:28:13 +0000 Received: by qyk1 with SMTP id 1so484920qyk.22 for ; Thu, 10 Sep 2009 15:27:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=gcDj5kgLh08TSOTfd05AMkrRyjqge9NPirhsmlZMbik=; b=IPJKzl3ohjKqw72eQOnQ2tp9fIwkH5emAz9LPZbvPKn1kDt/3dbjaNpAKm8/blMPuh 2L7Ra0e4a28KbV0WtwzBWS5hutoljYNcjANfo+wdOwoyaBXvn+WqCl1ElKtvPiQO6dIG ul0aPLRW9BhhmmRlKU3hFiNZgRKzdoMyW3WX8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=uDzKPcDU4eBM/7JLktnDM8gs3etIanHTGnSxW6s6hFLv0oGUjZHMwdoXWQVpac+J/0 4H/a9Sytm2aa3VBqdSdDqhcUyJRHTnESk4W7sAZgCJO88smxxzYOEL05QCKtcy1te0/P grG+y5FOoaglDWNTef2o6JRtLFlgYwrNv7uNc= MIME-Version: 1.0 Received: by 10.224.94.198 with SMTP id a6mr1945650qan.251.1252621672219; Thu, 10 Sep 2009 15:27:52 -0700 (PDT) In-Reply-To: <21228090C6E946F9873B45192743E48D@VEGA> References: <21228090C6E946F9873B45192743E48D@VEGA> Date: Thu, 10 Sep 2009 15:27:49 -0700 Message-ID: <85d3c3b60909101527i7374c20cy8b629d88142976fe@mail.gmail.com> Subject: Re: Problem with CharStream and Tokenizers with custom reset(Reader) method From: Jason Rutherglen To: java-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I've been seeing strange behavior perhaps related to this? Where sometimes a query is parsed and analyzed using Solr analyzers to it's first clause fairly randomly, and other times the same exact query is parsed and analyzed to the full correct query with all clauses. It's so baffling I haven't really figured out an approach to debugging it. I wonder if it's related to this stream resetting issue. On Thu, Sep 10, 2009 at 7:54 AM, Uwe Schindler wrote: > When reviewing the new CharStream code added to Tokenizers, I found a > serious problem with backwards compatibility and other Tokenizers, that d= o > not override reset(CharStream). > > The problem is, that e.g. CharTokenizer only overrides reset(Reader): > > =A0public void reset(Reader input) throws IOException { > =A0 =A0super.reset(input); > =A0 =A0bufferIndex =3D 0; > =A0 =A0offset =3D 0; > =A0 =A0dataLen =3D 0; > =A0} > > If you reset such a Tokenizer with another CharStream (not a Reader), thi= s > method will never be called and breaking the whole Tokenizer. > > As CharStream extends Reader, I propose to remove this reset(CharStream > method) and simply do an instanceof check to detect if the supplied Reade= r > is no CharStream and wrap it. We could also remove the extra ctor (becaus= e > most Tokenizers have no support for passing CharStreams). If the ctor als= o > checks with instanceof and warps as needed the code is backwards compatib= le > and we do not need to add additional ctors in subclasses. > > As this instanceof check is always done in CharReader.get() why not remov= e > ctor(CharStream) and reset(CharStream) completely? > > Any thoughts? > > I would like to fix this somehow before RC4, I', sorry :( > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org