lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damerian <dameria...@gmail.com>
Subject Re: Access next token in a stream
Date Thu, 09 Feb 2012 21:15:17 GMT
Στις 9/2/2012 8:54 μμ, ο/η Steven A Rowe έγραψε:
> Hi Damerian,
>
> One way to handle your scenario is to hold on to the previous token, and only emit a
token after you reach at least the second token (or at end-of-stream).  Your incrementToken()
method could look something like:
>
> 1. Get current attributes: input.incrementToken()
> 2. If previous token does not exist:
>        2a. Store current attributes as previous token (see AttributeSource#cloneAttributes)
> 	2b. Get current attributes: input.incrementToken()
> 3. Check for&  store conditions that will affect previous token's attributes
> 4. Store current attributes as next token (see AttributeSource#cloneAttributes)
> 5. Copy previous token into current attributes (see AttributeSource#copyTo);
>     the target will be "this", which is an AttributeSource.
> 6. Make changes based on conditions found in step #3 above
> 7. set previous token = next token
> 8. return true
>
> (Everywhere I say "token" I mean "instance of AttributeSource".)
>
> The final token in the input stream will need special handling, as will single-token
input streams.
>
> Good luck,
> Steve
>
>> -----Original Message-----
>> From: Damerian [mailto:dameriangr@gmail.com]
>> Sent: Thursday, February 09, 2012 2:19 PM
>> To: java-user@lucene.apache.org
>> Subject: Access next token in a stream
>>
>> Hello i want to implement my custom filter, my wuestion is quite simple
>> but i cannot find a solution to it no matter how i try:
>>
>> How can i access the TermAttribute of the  next token than the one i
>> currently have in my stream?
>>
>> For example in  the phrase "My name is James Bond" if let's say i am in
>> the token [My], i would like to be able to check the TermAttribute of
>> the following token [name] and fix my position increment accordingly.
>>
>> Thank you in advance!
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
Hi Steve,
Thank you for your immediate reply. i will try your solution but i feel 
that it does not solve my case.
What i am trying to make is a filter that joins together two 
terms/tokens that start with a capital letter (it is trying to find all 
the Names/Surnames and make them one token)  so in my aforementioned 
example when i examine [James] even if i store the TermAttribute to a 
temporary token how can i check the next one [Bond] , to join them 
without actually emmiting (and therefore creating a term in my inverted 
index) that has [James] on its own.
Thank you again for your insight and i would relly appreciate any other 
views on the matter.

Regards, Damerian


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message