lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <peter.kee...@charter.net>
Subject re: multiple tokens from a single input token
Date Mon, 10 Nov 2003 14:43:14 GMT
I would appreciate some clarification on how to generate multiple tokens from a single input
token.

In a previous message: (see: http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg04875.html),
Pierrick Brihaye provides the following code:

public final Token next() throws IOException {
while (true) {
String emittedText; 
int positionIncrement = 0; 
//New token ?
if (receivedText.length() == 0) {
receivedToken = input.next(); 
if (receivedToken == null) return null;
receivedText.append(receivedToken.termText()); 
positionIncrement = 1; 
}
emittedText = getNextPart(); 
//Warning : all tokens are emitted with the *same* offset
if (emittedText.length() > 0) {
Token emittedToken = new Token(emittedText, receivedToken.startOffset(), receivedToken.endOffset());
emittedToken.setPositionIncrement(positionIncrement);
return emittedToken;
} 
}
}
I assume that you would extend the TokenFilter class and override the 'next' method. But what
I don't understand is how you return more than one Token (with different settings for 'setpositionIncrement')
if the 'next' method is only called once for each input token.

For example, when my custom filter's 'next()' method receives token 'A' from 'DocumentWriter.invertDocument()',
it wants to return token 'A' and token 'B' at the same postion. How is this done? It seems
I can only return one token at a time from 'next()'. I think I'm missing something obvious
:-(

Thanks,
Peter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message