lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: How to deal with Token in the new TS API
Date Sun, 22 Nov 2009 14:34:56 GMT
If you just want to lookup if "Mr" is an abbreviation, why not look it up
when you handle that token and set a boolean variable in the TS
(lastTokenWasAbbreviation). When you process the ".", remove it if the
Boolean is set.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Sunday, November 22, 2009 3:28 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to deal with Token in the new TS API
> 
> What I've done is:
> 
> State state = in.captureState();
> ...
> // Upon new call to incrementToken().
> State tmp = in.captureState();
> in.restoreState(state);
> // check if termAttribute is an abbreviation.
> If not : in.restoreState(tmp);
> 
> But seems a lot of capturing/restoring to me ... how expensive is that?
> 
> Shai
> 
> On Sun, Nov 22, 2009 at 3:57 PM, Shai Erera <serera@gmail.com> wrote:
> 
> > Perhaps I misunderstand something. The current use case I'm trying to
> solve
> > is - I have an abbreviations TokenFilter which reads a token and stores
> it.
> > If the next token is end-of-sentence, it checks whether the previous one
> is
> > in the abbreviations list, and discards the end-of-sentence token. I
> need to
> > store the first token somewhere so I can reference it.
> >
> > Example: "hello mr. shai"
> > First token = hello -> store it and return
> > Second token = mr -> store it and return
> > Third token = "." -> check if "mr" is an abbreviation, if so don't
> return
> > ".".
> > Fourth token = "shai" -> store it and return.
> > ...
> >
> > How do I store "mr" (or any of the others)? It was easy w/ copyTo. If I
> > captureState, I get a State, but I can't query it for a TermAttribute.
> Any
> > ideas?
> >
> > Shai
> >
> >
> > On Sun, Nov 22, 2009 at 3:33 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> >> Use captureState and save the state somewhere. You can restore the
> state
> >> with restoreState to the TokenStream. CachingTokenFilter does this.
> >>
> >> So the new API uses the State object to put away tokens for later
> >> reference.
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >> > -----Original Message-----
> >> > From: Shai Erera [mailto:serera@gmail.com]
> >> > Sent: Sunday, November 22, 2009 2:29 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: How to deal with Token in the new TS API
> >> >
> >> > ok so from what I understand, I should stop working w/ Token, and
> move
> >> to
> >> > working w/ the Attributes.
> >> >
> >> > addAttribute indeed does not work. Even though it does not through an
> >> > exception, if I call in.addAttribute(Token.class), I get a new
> instance
> >> of
> >> > Token and not the once that was added by in. So this is even more
> severe
> >> > than just not blocking this option.
> >> >
> >> > I thought I can move to use addAttributeImpl, but that won't help me,
> >> > because I won't be able to call getAttribute(Token.class).
> >> >
> >> > So this leaves me w/ just working w/ the interfaces.
> >> >
> >> > What do I need to do in order to clone an attribute? Previously I
> used
> >> > token.copyTo(target). How I can do it now if I don't have copyTo on
> the
> >> > interfaces, and/or clone?
> >> >
> >> > Shai
> >> >
> >> > On Sun, Nov 22, 2009 at 2:58 PM, Uwe Schindler <uwe@thetaphi.de>
> wrote:
> >> >
> >> > > > But I do use addAttribute(Token.class), so I don't understand
why
> >> you
> >> > say
> >> > > > it's not possible. And I completely don't understand why the
new
> API
> >> > > > allows
> >> > > > me to just work w/ interfaces and not impls ... A while ago I
got
> >> the
> >> > > > impression that we're trying to get rid of interfaces because
> >> they're
> >> > not
> >> > > > easy to maintain back-compat with ...
> >> > >
> >> > > AddAttribute(Token.class) should throw an Exception, but it doesn't
> >> > (it's a
> >> > > bug in 3.0). addAttribute should only affect interfaces, it also
> >> accepts
> >> > > Token, because the AttributeFactory accepts it - bang.
> >> > >
> >> > > Sorry, but you can only pass attribute class literals to
> >> > > addAttribute/getAttribute/hasAttribute and so on.
> >> > >
> >> > > Sorry.
> >> > >
> >> > > Uwe
> >> > >
> >> > >
> >> > > -------------------------------------------------------------------
> --
> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> > >
> >> > >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message