Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C8FA17E49 for ; Tue, 24 Mar 2015 21:16:22 +0000 (UTC) Received: (qmail 7194 invoked by uid 500); 24 Mar 2015 21:16:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 7135 invoked by uid 500); 24 Mar 2015 21:16:20 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 7121 invoked by uid 99); 24 Mar 2015 21:16:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Mar 2015 21:16:20 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FORGED_YAHOO_RCVD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of skapni@yahoo.com designates 98.138.121.86 as permitted sender) Received: from [98.138.121.86] (HELO nm46-vm6.bullet.mail.ne1.yahoo.com) (98.138.121.86) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Mar 2015 21:15:53 +0000 Received: from [127.0.0.1] by nm46.bullet.mail.ne1.yahoo.com with NNFMP; 24 Mar 2015 21:15:51 -0000 Received: from [98.138.226.180] by nm46.bullet.mail.ne1.yahoo.com with NNFMP; 24 Mar 2015 21:12:51 -0000 Received: from [98.139.170.182] by tm15.bullet.mail.ne1.yahoo.com with NNFMP; 24 Mar 2015 21:12:51 -0000 Received: from [98.139.212.241] by tm25.bullet.mail.bf1.yahoo.com with NNFMP; 24 Mar 2015 21:12:51 -0000 Received: from [127.0.0.1] by omp1050.mail.bf1.yahoo.com with NNFMP; 24 Mar 2015 21:12:51 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 158938.50308.bm@omp1050.mail.bf1.yahoo.com X-YMail-OSG: AbtHZQwVM1nKmIlFbvj_2GhkYRfX_NzuNhuijRBy08CyItZ6rmmgaTVZbdQ_ViT y4vEjaPwuhoUCo97uQBDgaQ0YipJxa408ir2Tdb88jQHvdS7PSEkh_ByqZG8IIy3O2NhzqfXZ.MJ i97Rj7dp9C2ZpaAdY53PQnFBCczO.j1Nek_S3XCdGhYMLQ6xykflKm4Q_WV_Qc7xP5d7M5BLvem_ GYjjZV.7w4ZEONGM5UBxnt24emmCM7sON5qcVqIQxFOCOfwoqzrAwd7VfIh7PWtMSxJaOspJ0g8d Hxo5tm7oF489vMGSnYswQPOW3C3w6WtsCMp1366eT1rplET3dB4j7RboKZr9JK9G4kq1Qs_5DSU7 .AlLHucGYTIb4Yb9jEDT7mZucmgZeikDz3zuiyqMg_LvVbExL7LI8NVMfUeN4x8dOMp3AX4YA.iq 8jZ5fGMU5lrMOz2nrgUPXW5JFYqWSA7VXUHhb7QBtEDw86QZirlOx7XkOiQBmE0lQSE0NUU0hMwZ Y2YZmTGl9UsHz9.1dVjswDOBjGneMxZv6u7MNuDZgHOoL.VN.C4XqpqlAPEkFCmtOzvHKR0AvVtf U1h2xbj0SOjjB9.uH4PaT6h8fk7kBWL_b7g-- Received: by 76.13.27.34; Tue, 24 Mar 2015 21:12:50 +0000 Date: Tue, 24 Mar 2015 21:12:26 +0000 (UTC) From: Spyros Kapnissis Reply-To: Spyros Kapnissis To: "java-user@lucene.apache.org" Message-ID: <163949350.938891.1427231546052.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: <011001d065cf$417f45c0$c47dd140$@thetaphi.de> References: <011001d065cf$417f45c0$c47dd140$@thetaphi.de> Subject: Re: CachingTokenFilter tests fail when using MockTokenizer MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_938890_1117694518.1427231546045" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_938890_1117694518.1427231546045 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Uwe, thanks a lot for your answer. Makes perfect sense - I knew somet= hing was wrong with CachingTokenFilter! I will try to modify and adapt the = filter to avoid the error as per your instructions. By the way, is there a = better way/pattern to use for consuming two (or more) times the tokenstream= , maybe with an example from an existing filter? =20 On Tuesday, March 24, 2015 3:12 AM, Uwe Schindler wr= ote: =20 Hi, One of the problems is CachingTokenFilter not 100% conformant to the TokenS= tream/TokenFilter specs. It is mainly used in Lucene internally for stuff l= ike the highlighter, who needs to consume the same TokenStream multiple tim= es. But when doing this, the code knows how to handle that. One problem is = that reset() is wrongly defined: Instead the rewind case should be named re= wind(), so it behaves correctly and cannot be confused with reset() [which = is called before consumption automatically, which has side effects]. To me = CachingTokenFilter is a bug by itself=E2=84=A2. This filter is excluded fro= m our random tests because of those problems (it never gets tested by TestR= andomChains). The problem in your code is that you wrap the underlying TokenStream with C= achingTokenFilter inside incrementToken() and consume it, and this confuses= the whole TS state machine. You should wrap the TokenFilter in the constru= ctor with CachingTokenFilter, not too late in incrementToken() [at this poi= nt reset was already called on the underlying stream, so CachingTokenFilter= will do this a second time]. This leads to this problem, which may later c= ause the end() problem. In addition, the TokenFilter does not implement reset() correctly, so the w= hole thing cannot be reused in analyzers. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Spyros Kapnissis [mailto:skapni@yahoo.com.INVALID] > Sent: Monday, March 23, 2015 11:02 PM > To: java-user@lucene.apache.org; Ahmet Arslan > Subject: Re: CachingTokenFilter tests fail when using MockTokenizer >=20 > Hello Ahmet, > Unfortunately the test still fails with the same error: "end() called bef= ore > incrementToken() returned false!". I am not sure if I am misusing > CachingTokenFilter, or if it cannot be used with MockTokenizer, since it > "always calls end() before incrementToken() returns false". > Spyros >=20 >=20 >=20 >=20 >=C2=A0 =C2=A0 =C2=A0 On Monday, March 23, 2015 9:12 PM, Ahmet Arslan > wrote: >=20 >=20 >=C2=A0 Hi Spyros, >=20 > Not 100% sure but I think you should override reset method. >=20 > @Override > public void reset() throws IOException { > super.reset(); >=20 > cachedInput =3D null; > } >=20 > Ahmet >=20 >=20 > On Monday, March 23, 2015 1:29 PM, Spyros Kapnissis > wrote: > Hello, > We have a couple of custom token filters that use CachingTokenFilter > internally. However, when we try to test them with MockTokenizer so that > we can have these nice TokenStream API checks that it provides, the tests > fail with: "java.lang.AssertionError: end() called before incrementToken(= ) > returned false!" >=20 > Here is a link with a unit test to reproduce the issue: > https://gist.github.com/spyk/c783c72689410070811b > Do we misuse CachingTokenFilter? Or is it an issue of MockTonenizer when > used with CachingTokenFilter? > Thanks!Spyros >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 >=20 >=20 >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org ------=_Part_938890_1117694518.1427231546045--