lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adriano Crestani <adrianocrest...@gmail.com>
Subject Re: how to reuse a tokenStream?
Date Sat, 29 May 2010 07:41:05 GMT
Hi,

If the reader supports reset, try to also reset it.

Anyway, if in your final code, you are really using WhitespaceAnalyzer, I
guess you should use WhitespaceTokenizer instead of WhitespaceAnalyzer:

WhitespaceTokenizer tokenizer = new WhitespaceTokenizer(reader);
// consume tokens
reader.reset();
tokenizer.reset(reader);
// consume it again

Best Regards,
Adriano Crestani

On Sat, May 29, 2010 at 1:43 AM, Li Li <fancyerii@gmail.com> wrote:

> I want to implement an analyzer which use WhitespaceAnalyzer first
> then my tokenFilter. But my filter need not global information of
> token such as how many times a token occur. So in  tokenStream method.
> I iterate the tokenStream to get all the things I need. Then
> pass this information to my own TokenFilter. But my tokenfilter's
> incrementToken method will always get nothing from the TokenStream
> because it's already consumed.
> public MyAnalyzer{
> ...
>         Analyzer wa=new WhitespaceAnalyzer();
>         public TokenStream tokenStream(String fieldName, Reader reader) {
>               TokenStream tokenStream=null;
>
>               try {
>                       tokenStream = wa.reusableTokenStream(fieldName,
> reader);
>                       while(tokenStream.incrementToken()){
>                               //do something
>                       }
>               } catch (IOException e1) {
>                       e1.printStackTrace();
>               }
> /*
>               try {
>                       tokenStream.reset();
>                       //tokenStream =
> wa.reusableTokenStream(fieldName, reader);
>                       while(tokenStream.incrementToken()){
>                                //it don't come here, which means the
> tokenStream is not resuable.
>                                System.out.println("resuable");
>                       }
>               } catch (IOException e) {
>                        e.printStackTrace();
>               }
> */
>              MyFilter stream = new MyFilter(tokenStream);
>              return stream;
>      }
> }
>
> 2010/5/28 Erick Erickson <erickerickson@gmail.com>:
> > What is the problem you're seeing? Maybe a stack trace?
> > You haven't told us what the incorrect behavior is.
> >
> > Best
> > Erick
> >
> > On Fri, May 28, 2010 at 12:52 AM, Li Li <fancyerii@gmail.com> wrote:
> >
> >> I want to analyzer a text twice so that I can get some statistic
> >> information from this text
> >>                                TokenStream tokenStream=null;
> >>                Analyzer wa=new WhitespaceAnalyzer();
> >>                try {
> >>                        tokenStream = wa.reusableTokenStream(fieldName,
> >> reader);
> >>                        while(tokenStream.incrementToken()){
> >>                                //do something
> >>                        }
> >>                } catch (IOException e1) {
> >>                        e1.printStackTrace();
> >>                }
> >>
> >>                try {
> >>                                                tokenStream.reset();
> >>                        //tokenStream = wa.reusableTokenStream(fieldName,
> >> reader);
> >>                        while(tokenStream.incrementToken()){
> >>                                                                //not
> here
> >>                                System.out.println("resuable");
> >>
> >>                        }
> >>                } catch (IOException e) {
> >>                        // TODO Auto-generated catch block
> >>                        e.printStackTrace();
> >>                }
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message