Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 175D096C3 for ; Tue, 25 Dec 2012 18:49:14 +0000 (UTC) Received: (qmail 63781 invoked by uid 500); 25 Dec 2012 18:49:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 63723 invoked by uid 500); 25 Dec 2012 18:49:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 63715 invoked by uid 99); 25 Dec 2012 18:49:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Dec 2012 18:49:12 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sarowe@gmail.com designates 209.85.212.51 as permitted sender) Received: from [209.85.212.51] (HELO mail-vb0-f51.google.com) (209.85.212.51) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Dec 2012 18:49:04 +0000 Received: by mail-vb0-f51.google.com with SMTP id fq11so8156136vbb.38 for ; Tue, 25 Dec 2012 10:48:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=4Uc3mlakAjRpE987kQ0NPi14H554NGAj10qbKFpcQkA=; b=zh7kd5glq6xMNUF6yR4ILlCpr4LlJ/CjBy/dsJuZ+myYaDaxE7tZUGuI7u37SW1mSq lHqYy4rjJOHr89tzpiQHCvkOQV9dVp9weC53HiFwqEaazkElveLUccqC3uuHT3Te6Pee 7/K0ErUCx+pSqYKNbQGvGrsDQjFzMIR069cALGSl0qX1jNMkur4iZmmbYZdAyHP9j6hW x0rp4qwC33Kue73QiYEAbaCQaSg9PMHy0xDYCk+zPp6yFYsufmIxxmQcvKWBnrCneuqj oGJi3abqYKay8brNKYnPCndkeBI+e7bvGBvIDg3Ruj+geJs7HCmNZVOhPNfzB0wicqSB fEZg== X-Received: by 10.220.154.148 with SMTP id o20mr38436142vcw.54.1356461323446; Tue, 25 Dec 2012 10:48:43 -0800 (PST) Received: from [192.168.1.202] (cpe-67-249-104-72.twcny.res.rr.com. [67.249.104.72]) by mx.google.com with ESMTPS id b10sm21503846vdk.15.2012.12.25.10.48.42 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 25 Dec 2012 10:48:42 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: TokenStream: How to get token text? From: Steve Rowe In-Reply-To: Date: Tue, 25 Dec 2012 13:48:41 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <94BE591F-BB4D-40F6-86DE-3448A346AB08@gmail.com> References: To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org Hi Dima, Did you see my response to your earlier email? I think it's what you're = looking for: http://markmail.org/message/jdcjxauj4odyuv7e Steve On Dec 25, 2012, at 1:17 PM, dokondr wrote: > Hello, > Please, help. I am lost in TokenStream / Token / Analyzer API. > I am trying to figure out how to get _token_itself_ or token text = while > looking at "Invoking the Analyzer" example (see example below and also = at: > = http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/packag= e-summary.html?is-external=3Dtrue#package_description > ) >=20 > Method "ts.reflectAsString(true))" returns lots of useful info: > = org.apache.lucene.analysis.tokenattributes.CharTermAttribute#term=3Dsome,o= rg.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute#bytes=3D= [73 > 6f 6d > = 65],org.apache.lucene.analysis.tokenattributes.OffsetAttribute#startOffset= =3D0,org.apache.lucene.analysis.tokenattributes.OffsetAttribute#endOffset=3D= 4,org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute#po= sitionIncrement=3D1,org.apache.lucene.analysis.tokenattributes.TypeAttribu= te#type=3D,org.apache.lucene.analysis.tokenattributes.KeywordAtt= ribute#keyword=3Dfalse >=20 > Yet, how to get token itself? In this case "some" ? >=20 > Thanks! >=20 > ------ Example in the documentation -------- >=20 > Version matchVersion =3D Version.LUCENE_XY; // Substitute desired = Lucene > version for XY > Analyzer analyzer =3D new StandardAnalyzer(matchVersion); // or any = other > analyzer > TokenStream ts =3D analyzer.tokenStream("myfield", new = StringReader("some > text goes here")); > OffsetAttribute offsetAtt =3D addAttribute(OffsetAttribute.class); >=20 > try { > ts.reset(); // Resets this stream to the beginning. (Required) > while (ts.incrementToken()) { > // Use AttributeSource.reflectAsString(boolean) > // for token stream debugging. > System.out.println("token: " + ts.reflectAsString(true)); >=20 > System.out.println("token start offset: " + > offsetAtt.startOffset()); > System.out.println(" token end offset: " + = offsetAtt.endOffset()); > } > ts.end(); // Perform end-of-stream operations, e.g. set the = final > offset. > } finally { > ts.close(); // Release resources associated with this stream. > } --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org