lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: How to deal with Token in the new TS API
Date Sun, 22 Nov 2009 13:57:43 GMT
Perhaps I misunderstand something. The current use case I'm trying to solve
is - I have an abbreviations TokenFilter which reads a token and stores it.
If the next token is end-of-sentence, it checks whether the previous one is
in the abbreviations list, and discards the end-of-sentence token. I need to
store the first token somewhere so I can reference it.

Example: "hello mr. shai"
First token = hello -> store it and return
Second token = mr -> store it and return
Third token = "." -> check if "mr" is an abbreviation, if so don't return
".".
Fourth token = "shai" -> store it and return.
...

How do I store "mr" (or any of the others)? It was easy w/ copyTo. If I
captureState, I get a State, but I can't query it for a TermAttribute. Any
ideas?

Shai

On Sun, Nov 22, 2009 at 3:33 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Use captureState and save the state somewhere. You can restore the state
> with restoreState to the TokenStream. CachingTokenFilter does this.
>
> So the new API uses the State object to put away tokens for later
> reference.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Shai Erera [mailto:serera@gmail.com]
> > Sent: Sunday, November 22, 2009 2:29 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: How to deal with Token in the new TS API
> >
> > ok so from what I understand, I should stop working w/ Token, and move to
> > working w/ the Attributes.
> >
> > addAttribute indeed does not work. Even though it does not through an
> > exception, if I call in.addAttribute(Token.class), I get a new instance
> of
> > Token and not the once that was added by in. So this is even more severe
> > than just not blocking this option.
> >
> > I thought I can move to use addAttributeImpl, but that won't help me,
> > because I won't be able to call getAttribute(Token.class).
> >
> > So this leaves me w/ just working w/ the interfaces.
> >
> > What do I need to do in order to clone an attribute? Previously I used
> > token.copyTo(target). How I can do it now if I don't have copyTo on the
> > interfaces, and/or clone?
> >
> > Shai
> >
> > On Sun, Nov 22, 2009 at 2:58 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > > But I do use addAttribute(Token.class), so I don't understand why you
> > say
> > > > it's not possible. And I completely don't understand why the new API
> > > > allows
> > > > me to just work w/ interfaces and not impls ... A while ago I got the
> > > > impression that we're trying to get rid of interfaces because they're
> > not
> > > > easy to maintain back-compat with ...
> > >
> > > AddAttribute(Token.class) should throw an Exception, but it doesn't
> > (it's a
> > > bug in 3.0). addAttribute should only affect interfaces, it also
> accepts
> > > Token, because the AttributeFactory accepts it - bang.
> > >
> > > Sorry, but you can only pass attribute class literals to
> > > addAttribute/getAttribute/hasAttribute and so on.
> > >
> > > Sorry.
> > >
> > > Uwe
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message