Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E9A0E3AD for ; Wed, 27 Feb 2013 19:03:12 +0000 (UTC) Received: (qmail 75141 invoked by uid 500); 27 Feb 2013 19:03:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 75073 invoked by uid 500); 27 Feb 2013 19:03:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 75064 invoked by uid 99); 27 Feb 2013 19:03:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2013 19:03:10 +0000 X-ASF-Spam-Status: No, hits=1.3 required=5.0 tests=SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2013 19:03:03 +0000 Received: from VEGA (port-92-196-61-239.dynamic.qsc.de [92.196.61.239]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id 7907114AA051 for ; Wed, 27 Feb 2013 19:02:42 +0000 (UTC) From: "Uwe Schindler" To: References: <1361985921447-4043427.post@n3.nabble.com> In-Reply-To: <1361985921447-4043427.post@n3.nabble.com> Subject: RE: Confusion with Analyzer.tokenStream() re-use in 4.1 Date: Wed, 27 Feb 2013 20:02:41 +0100 Message-ID: <000801ce151d$04f011c0$0ed03540$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQFXS/8tl8Ah2c3tGvgRQBJzVs+AMZl7oCkQ Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org The problem here is that the tokenstream is instantiated in the same = thread from 2 different code paths and consumed later. If you add = fields, the indexer will fetch a new reused TokenStream one after each = other and consume them directly after getting. It will not interleave = this. In your case, the second field is instantiated using a = TokenStream, which is already initialized. Unfortunately, if you ask the = analyzer for another TokenStream later, the already opened one gets = invalid (the second field). Don't use new Field(name, TokenStream) with TokenStreams from Analyzers, = because they are only "valid" for a very short time. If you need to do = this, use a second Analyzer instance. If you add Fields with a String = value, the TokenStream is created on they fly and is be consumed by the = DocumentsWriter directly after getting it. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Konstantyn Smirnov [mailto:injecteer@yahoo.com] > Sent: Wednesday, February 27, 2013 6:25 PM > To: java-user@lucene.apache.org > Subject: Confusion with Analyzer.tokenStream() re-use in 4.1 >=20 > Dear all, >=20 > I'm using the following test-code: >=20 > Document doc =3D new Document() > Analyzer a =3D new SimpleAnalyzer( Version.LUCENE_41 ) >=20 > TokenStream inputTS =3D a.tokenStream( 'name1', new StringReader( 'aaa = bbb > ccc' ) ) Field f =3D new TextField( 'name1', inputTS ) doc.add f >=20 > TokenStream ts =3D doc.getField( 'name1' ).tokenStreamValue() > ts.reset() >=20 > String sb =3D '' > while( ts.incrementToken() ) sb +=3D ts.getAttribute( = CharTermAttribute ) + '|' > assert 'aaa|bbb|ccc|' =3D=3D sb >=20 > inputTS =3D a.tokenStream( 'name2', new StringReader( 'xxx zzz' ) ) f = =3D new > TextField( 'name2', inputTS ) doc.add f >=20 > TokenStream ts =3D doc.getField( 'name2' ).tokenStreamValue() > ts.reset() >=20 > sb =3D '' > while( ts.incrementToken() ) sb +=3D ts.getAttribute( = CharTermAttribute ) + '|' > assert 'xxx|zzz|' =3D=3D sb // << FAILS! -> sb =3D=3D '' and = ts.incrementTokent() =3D=3D > false >=20 > The 1st added field lets read it's tokentStreamValue() tokens, all = subsequent > calls bring nothing, unless I re-instantiate the analyzer. >=20 > Another strange thing is, that just before adding a new field to the > document, the tokenStream is filled.. >=20 > What am I doing wrong? >=20 > TIA >=20 >=20 >=20 >=20 > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Confusion-with-Analyzer- > tokenStream-re-use-in-4-1-tp4043427.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org