lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: TokenStream: How to get token text?
Date Tue, 25 Dec 2012 18:48:41 GMT
Hi Dima,

Did you see my response to your earlier email?  I think it's what you're looking for:

http://markmail.org/message/jdcjxauj4odyuv7e

Steve

On Dec 25, 2012, at 1:17 PM, dokondr <dokondr@gmail.com> wrote:

> Hello,
> Please, help. I am lost in TokenStream / Token / Analyzer API.
> I am trying to figure out how to get _token_itself_ or token text while
> looking at "Invoking the Analyzer" example (see example below and also at:
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html?is-external=true#package_description
> )
> 
> Method "ts.reflectAsString(true))" returns lots of useful info:
> org.apache.lucene.analysis.tokenattributes.CharTermAttribute#term=some,org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute#bytes=[73
> 6f 6d
> 65],org.apache.lucene.analysis.tokenattributes.OffsetAttribute#startOffset=0,org.apache.lucene.analysis.tokenattributes.OffsetAttribute#endOffset=4,org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute#positionIncrement=1,org.apache.lucene.analysis.tokenattributes.TypeAttribute#type=<ALPHANUM>,org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false
> 
> Yet, how to get token itself? In this case "some" ?
> 
> Thanks!
> 
> ------ Example in the documentation --------
> 
>   Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene
> version for XY
>    Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other
> analyzer
>    TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some
> text goes here"));
>    OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class);
> 
>    try {
>      ts.reset(); // Resets this stream to the beginning. (Required)
>      while (ts.incrementToken()) {
>        // Use AttributeSource.reflectAsString(boolean)
>        // for token stream debugging.
>        System.out.println("token: " + ts.reflectAsString(true));
> 
>        System.out.println("token start offset: " +
> offsetAtt.startOffset());
>        System.out.println("  token end offset: " + offsetAtt.endOffset());
>      }
>      ts.end();   // Perform end-of-stream operations, e.g. set the final
> offset.
>    } finally {
>      ts.close(); // Release resources associated with this stream.
>    }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message