lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)
Date Sun, 07 Mar 2010 20:40:27 GMT


Uwe Schindler commented on LUCENE-2302:

The name ExtendedTermAttribute is to be discussed :-) Any comments?

> Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence,
> --------------------------------------------------------------------------------------------------------
>                 Key: LUCENE-2302
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: Flex Branch
>            Reporter: Uwe Schindler
>             Fix For: Flex Branch
> For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute
only supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly
work on the byte[] array.
> Also TermAttribute lacks of some interfaces that would make it simplier for users to
work with them: Appendable and CharSequence
> I propose to create a new interface "ExtendedTermAttribute extends TermAttribute". The
corresponding -Impl class is always an implementation that extends ExtendedTermAttribute .
So if somebody adds a TermAttribute an AttributeSource he will get an implementation class
that can be also used as TermAttribute2. As both attributes create the same impl instance
both calls to addAttribute are equal. So a TokenFilter that adds ExtendedTermAttribute to
the source will work with the same instance as the Tokenizer that requested the (deprecated)
> To support both byte[] and char[] the internals will be implemented like Token in 2.9:
Support for String and char[]. So the buffers are both available, but you can only use one
of them. as soon as you call getByteBuffer(), and the char[] buffer is used, it will be transformed.
So the inder will always call getBytes() and get the UTF-8 bytes. NumericTokenStream will
modify the byte[] directly and if no filter that uses char[] is plugged on top, the buffer
is never transformed.
> This issue will also convert the rest of NRQ to byte[] and deprecate all old methods
in NumericUtils. NRQ will directly request ByteRef from splitRange and so on.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message