lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damerian <>
Subject Re: Access next token in a stream
Date Thu, 09 Feb 2012 21:15:17 GMT
Στις 9/2/2012 8:54 μμ, ο/η Steven A Rowe έγραψε:
> Hi Damerian,
> One way to handle your scenario is to hold on to the previous token, and only emit a
token after you reach at least the second token (or at end-of-stream).  Your incrementToken()
method could look something like:
> 1. Get current attributes: input.incrementToken()
> 2. If previous token does not exist:
>        2a. Store current attributes as previous token (see AttributeSource#cloneAttributes)
> 	2b. Get current attributes: input.incrementToken()
> 3. Check for&  store conditions that will affect previous token's attributes
> 4. Store current attributes as next token (see AttributeSource#cloneAttributes)
> 5. Copy previous token into current attributes (see AttributeSource#copyTo);
>     the target will be "this", which is an AttributeSource.
> 6. Make changes based on conditions found in step #3 above
> 7. set previous token = next token
> 8. return true
> (Everywhere I say "token" I mean "instance of AttributeSource".)
> The final token in the input stream will need special handling, as will single-token
input streams.
> Good luck,
> Steve
>> -----Original Message-----
>> From: Damerian []
>> Sent: Thursday, February 09, 2012 2:19 PM
>> To:
>> Subject: Access next token in a stream
>> Hello i want to implement my custom filter, my wuestion is quite simple
>> but i cannot find a solution to it no matter how i try:
>> How can i access the TermAttribute of the  next token than the one i
>> currently have in my stream?
>> For example in  the phrase "My name is James Bond" if let's say i am in
>> the token [My], i would like to be able to check the TermAttribute of
>> the following token [name] and fix my position increment accordingly.
>> Thank you in advance!
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
Hi Steve,
Thank you for your immediate reply. i will try your solution but i feel 
that it does not solve my case.
What i am trying to make is a filter that joins together two 
terms/tokens that start with a capital letter (it is trying to find all 
the Names/Surnames and make them one token)  so in my aforementioned 
example when i examine [James] even if i store the TermAttribute to a 
temporary token how can i check the next one [Bond] , to join them 
without actually emmiting (and therefore creating a term in my inverted 
index) that has [James] on its own.
Thank you again for your insight and i would relly appreciate any other 
views on the matter.

Regards, Damerian

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message