lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-4642) Add create(AttributeFactory) to TokenizerFactory and subclasses with ctors taking AttributeFactory, and remove Tokenizer's and subclasses' ctors taking AttributeSource
Date Thu, 14 Mar 2013 23:56:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602894#comment-13602894
] 

Steve Rowe edited comment on LUCENE-4642 at 3/14/13 11:54 PM:
--------------------------------------------------------------

Patch:

- {{TokenizerFactory.create(Reader)}} is made final, and calls the {{AttributeFactory}}-accepting
version with {{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}}
- {{TokenizerFactory.create(AttributeFactory, Reader)}} is made abstract
- Added {{AttributeFactory}}-accepting constructors to all {{Tokenizer}}'s with existing {{TokenizerFactory}}
subclasses that didn't already have them
- Removed {{create(Reader)}} from all TokenizerFactory subclasses.

In this patch there is a new even more horrible hack in {{TrieTokenizer(Factory)}} - the {{AttributeFactory}}
argument to the {{TrieTokenizer}} constructor is *ignored*!!!  Surely there a better way???:

{code:java}
public class TrieTokenizerFactory extends TokenizerFactory {
...
  @Override
  public TrieTokenizer create(AttributeFactory factory, Reader input) {
    return new TrieTokenizer(factory, input, type, TrieTokenizer.getNumericTokenStream(precisionStep));
  }
}

final class TrieTokenizer extends Tokenizer {
...
  public TrieTokenizer(Reader input, TrieTypes type, final NumericTokenStream ts) {
    this(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, input, type, ts);
  }

  public TrieTokenizer(AttributeFactory factory, Reader input, TrieTypes type, final NumericTokenStream
ts) {
    // Hack #0: factory param is ignored
    // Häckidy-Hick-Hack #1: must share the attributes with the NumericTokenStream we delegate
to, so we create a fake factory:
    super(new AttributeFactory() {
      @Override
      public AttributeImpl createAttributeInstance(Class<? extends Attribute> attClass)
{
        return (AttributeImpl) ts.addAttribute(attClass);
      }
    }, input);
    // add all attributes:
    for (Iterator<Class<? extends Attribute>> it = ts.getAttributeClassesIterator();
it.hasNext();) {
      addAttribute(it.next());
    }
    this.type = type;
    this.ts = ts;
    // dates tend to be longer, especially when math is involved
    termAtt.resizeBuffer( type == TrieTypes.DATE ? 128 : 32 );
  }
{code}
 
                
      was (Author: steve_rowe):
    Patch:

- {{TokenizerFactory.create(Reader)}} calls the {{AttributeFactory}}-accepting version with
{{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}}
- {{TokenizerFactory.create(AttributeFactory, Reader)}} is made abstract
- Added {{AttributeFactory}}-accepting constructors to all {{Tokenizer}}s with existing {{TokenizerFactory}}
subclasses that didn't already have them
- Removed {{create(Reader)}} from all TokenizerFactory subclasses.

In this patch there is a new even more horrible hack in {{TrieTokenizer(Factory)}} - the {{AttributeFactory}}
argument to the {{TrieTokenizer}} constructor is *ignored*!!!  Surely there a better way???:

{code:java}
public class TrieTokenizerFactory extends TokenizerFactory {
...
  @Override
  public TrieTokenizer create(AttributeFactory factory, Reader input) {
    return new TrieTokenizer(factory, input, type, TrieTokenizer.getNumericTokenStream(precisionStep));
  }
}

final class TrieTokenizer extends Tokenizer {
...
  public TrieTokenizer(Reader input, TrieTypes type, final NumericTokenStream ts) {
    this(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, input, type, ts);
  }

  public TrieTokenizer(AttributeFactory factory, Reader input, TrieTypes type, final NumericTokenStream
ts) {
    // Hack #0: factory param is ignored
    // Häckidy-Hick-Hack #1: must share the attributes with the NumericTokenStream we delegate
to, so we create a fake factory:
    super(new AttributeFactory() {
      @Override
      public AttributeImpl createAttributeInstance(Class<? extends Attribute> attClass)
{
        return (AttributeImpl) ts.addAttribute(attClass);
      }
    }, input);
    // add all attributes:
    for (Iterator<Class<? extends Attribute>> it = ts.getAttributeClassesIterator();
it.hasNext();) {
      addAttribute(it.next());
    }
    this.type = type;
    this.ts = ts;
    // dates tend to be longer, especially when math is involved
    termAtt.resizeBuffer( type == TrieTypes.DATE ? 128 : 32 );
  }
{code}
 
                  
> Add create(AttributeFactory) to TokenizerFactory and subclasses with ctors taking AttributeFactory,
and remove Tokenizer's and subclasses' ctors taking AttributeSource
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4642
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4642
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.1
>            Reporter: Renaud Delbru
>            Assignee: Steve Rowe
>              Labels: analysis, attribute, tokenizer
>             Fix For: 4.3
>
>         Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch,
LUCENE-4642-single-create-method-on-TokenizerFactory-subclasses.patch, TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given AttributeSource as
parameter (LUCENE-1826).  These should be removed.
> TokenizerFactory does not provide an API to create tokenizers with a given AttributeFactory,
but quite a few tokenizers have constructors that take an AttributeFactory.  TokenizerFactory
should add a create(AttributeFactory) method, as should subclasses for tokenizers with AttributeFactory
accepting ctors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message