lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)
Date Sun, 07 Mar 2010 20:40:27 GMT
Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence,
Appendable)
--------------------------------------------------------------------------------------------------------

                 Key: LUCENE-2302
                 URL: https://issues.apache.org/jira/browse/LUCENE-2302
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Analysis
    Affects Versions: Flex Branch
            Reporter: Uwe Schindler
             Fix For: Flex Branch


For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute only
supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly work
on the byte[] array.
Also TermAttribute lacks of some interfaces that would make it simplier for users to work
with them: Appendable and CharSequence

I propose to create a new interface "ExtendedTermAttribute extends TermAttribute". The corresponding
-Impl class is always an implementation that extends ExtendedTermAttribute . So if somebody
adds a TermAttribute an AttributeSource he will get an implementation class that can be also
used as TermAttribute2. As both attributes create the same impl instance both calls to addAttribute
are equal. So a TokenFilter that adds ExtendedTermAttribute to the source will work with the
same instance as the Tokenizer that requested the (deprecated) TermAttribute.

To support both byte[] and char[] the internals will be implemented like Token in 2.9: Support
for String and char[]. So the buffers are both available, but you can only use one of them.
as soon as you call getByteBuffer(), and the char[] buffer is used, it will be transformed.
So the inder will always call getBytes() and get the UTF-8 bytes. NumericTokenStream will
modify the byte[] directly and if no filter that uses char[] is plugged on top, the buffer
is never transformed.

This issue will also convert the rest of NRQ to byte[] and deprecate all old methods in NumericUtils.
NRQ will directly request ByteRef from splitRange and so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message