lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: How to deal with Token in the new TS API
Date Sun, 22 Nov 2009 14:27:45 GMT
What I've done is:

State state = in.captureState();
...
// Upon new call to incrementToken().
State tmp = in.captureState();
in.restoreState(state);
// check if termAttribute is an abbreviation.
If not : in.restoreState(tmp);

But seems a lot of capturing/restoring to me ... how expensive is that?

Shai

On Sun, Nov 22, 2009 at 3:57 PM, Shai Erera <serera@gmail.com> wrote:

> Perhaps I misunderstand something. The current use case I'm trying to solve
> is - I have an abbreviations TokenFilter which reads a token and stores it.
> If the next token is end-of-sentence, it checks whether the previous one is
> in the abbreviations list, and discards the end-of-sentence token. I need to
> store the first token somewhere so I can reference it.
>
> Example: "hello mr. shai"
> First token = hello -> store it and return
> Second token = mr -> store it and return
> Third token = "." -> check if "mr" is an abbreviation, if so don't return
> ".".
> Fourth token = "shai" -> store it and return.
> ...
>
> How do I store "mr" (or any of the others)? It was easy w/ copyTo. If I
> captureState, I get a State, but I can't query it for a TermAttribute. Any
> ideas?
>
> Shai
>
>
> On Sun, Nov 22, 2009 at 3:33 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Use captureState and save the state somewhere. You can restore the state
>> with restoreState to the TokenStream. CachingTokenFilter does this.
>>
>> So the new API uses the State object to put away tokens for later
>> reference.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> > -----Original Message-----
>> > From: Shai Erera [mailto:serera@gmail.com]
>> > Sent: Sunday, November 22, 2009 2:29 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: How to deal with Token in the new TS API
>> >
>> > ok so from what I understand, I should stop working w/ Token, and move
>> to
>> > working w/ the Attributes.
>> >
>> > addAttribute indeed does not work. Even though it does not through an
>> > exception, if I call in.addAttribute(Token.class), I get a new instance
>> of
>> > Token and not the once that was added by in. So this is even more severe
>> > than just not blocking this option.
>> >
>> > I thought I can move to use addAttributeImpl, but that won't help me,
>> > because I won't be able to call getAttribute(Token.class).
>> >
>> > So this leaves me w/ just working w/ the interfaces.
>> >
>> > What do I need to do in order to clone an attribute? Previously I used
>> > token.copyTo(target). How I can do it now if I don't have copyTo on the
>> > interfaces, and/or clone?
>> >
>> > Shai
>> >
>> > On Sun, Nov 22, 2009 at 2:58 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> >
>> > > > But I do use addAttribute(Token.class), so I don't understand why
>> you
>> > say
>> > > > it's not possible. And I completely don't understand why the new API
>> > > > allows
>> > > > me to just work w/ interfaces and not impls ... A while ago I got
>> the
>> > > > impression that we're trying to get rid of interfaces because
>> they're
>> > not
>> > > > easy to maintain back-compat with ...
>> > >
>> > > AddAttribute(Token.class) should throw an Exception, but it doesn't
>> > (it's a
>> > > bug in 3.0). addAttribute should only affect interfaces, it also
>> accepts
>> > > Token, because the AttributeFactory accepts it - bang.
>> > >
>> > > Sorry, but you can only pass attribute class literals to
>> > > addAttribute/getAttribute/hasAttribute and so on.
>> > >
>> > > Sorry.
>> > >
>> > > Uwe
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message