lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Smith (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-1826) All Tokenizer implementations should have constructor that takes an AttributeSource
Date Thu, 20 Aug 2009 17:38:15 GMT
All Tokenizer implementations should have constructor that takes an AttributeSource
-----------------------------------------------------------------------------------

                 Key: LUCENE-1826
                 URL: https://issues.apache.org/jira/browse/LUCENE-1826
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Analysis
    Affects Versions: 2.9
            Reporter: Tim Smith


I have a TokenStream implementation that joins together multiple sub TokenStreams (i then
do additional filtering on top of this, so i can't just have the indexer do the merging)

in 2.4, this worked fine.
once one sub stream was exhausted, i just started using the next stream 

however, in 2.9, this is very difficult, and requires copying Term buffers for every token
being aggregated

however, if all the sub TokenStreams share the same AttributeSource, and my "concat" TokenStream
shares the same AttributeSource, this goes back to being very simple (and very efficient)


So for example, i would like to see the following constructor added to StandardTokenizer:
{code}
  public StandardTokenizer(AttributeSource source, Reader input, boolean replaceInvalidAcronym)
{
    super(source);
    ...
  }
{code}

would likewise want similar constructors added to all Tokenizer sub classes provided by lucene


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message