lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Getting the actual token from Token's term buffer
Date Sat, 08 Dec 2007 22:44:20 GMT
Hi,

It's been a while since I've written a custom TokenFilter, and I'm not having luck getting
tokens out of the TokenStream using 2.3-dev.
I'm hitting that default term buffer of the size 10 using the following:

    public final Token next(Token result) throws IOException {
        result = input.next(result);
        if (result != null) {
            final int len = result.termLength();       // gives me the actual term length,
not the buffer of length 10
            result.setTermLength(len);
            System.out.println("LEN 1: " + len);     // prints the actual length
            final char[] buffer = result.termBuffer(); // this still gives me the buffer of
length 10
            System.out.println("LEN 2: " + buffer.length); // and this prints 10


Is the idea to:
1) get the char[] buffer from Token
2) get its real length via termLength()
3) manually fill a new char[]  with the content of the buffer, minus the extra buffering?

I'm looking at Token to see how to get the *actual* term, but don't see anything, so it looks
like a Filter writer has to do one of these for each term buffer:

    public final Token next(Token result) throws IOException {

        result = input.next(result);

        if (result != null) {
            final int len = result.termLength();
            final char[] buffer = result.termBuffer();
            final char[] token = new char[len];
            System.arraycopy(buffer, 0, token, 0, len);

Am I missing a Token method I could use instead, or is this the new way to go?

Thanks,
Otis



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message