lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements
Date Tue, 21 Jul 2009 17:44:38 GMT
bq. * Are we going to keep Token.java or not?  (Current patch still has it
deprecated).
I need to know this as well - I have to make a new Token class for the
Highlighter package if this one is deprecated. It would seem a convenience to
keep it around.

On Tue, Jul 21, 2009 at 8:51 AM, Grant Ingersoll (JIRA) <jira@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733604#action_12733604]
>
> Grant Ingersoll commented on LUCENE-1693:
> -----------------------------------------
>
> One of the things that works really well in Solr is that any time some
> significant JIRA issue is undertaken, a Wiki page is also generated that
> effectively documents the ideas in the patch, as well as how to use it and
> thus results in the final page effectively becoming the documentation.  I
> know Mike and Uwe have done a ton of work on this, but would it be too much
> trouble to ask for a Wiki page that describes the current state of the
> patch?  It is really hard to follow, in JIRA, all the different threads and
> which ones are still valid and which are not.
>
> > AttributeSource/TokenStream API improvements
> > --------------------------------------------
> >
> >                 Key: LUCENE-1693
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1693
> >             Project: Lucene - Java
> >          Issue Type: Improvement
> >          Components: Analysis
> >            Reporter: Michael Busch
> >            Assignee: Michael Busch
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1693.patch, LUCENE-1693.patch,
> lucene-1693.patch, LUCENE-1693.patch, lucene-1693.patch, LUCENE-1693.patch,
> LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch,
> LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch,
> LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, PerfTest3.java,
> TestAPIBackwardsCompatibility.java, TestCompatibility.java,
> TestCompatibility.java, TestCompatibility.java, TestCompatibility.java
> >
> >
> > This patch makes the following improvements to AttributeSource and
> > TokenStream/Filter:
> > - removes the set/getUseNewAPI() methods (including the standard
> >   ones). Instead by default incrementToken() throws a subclass of
> >   UnsupportedOperationException. The indexer tries to call
> >   incrementToken() initially once to see if the exception is thrown;
> >   if so, it falls back to the old API.
> > - introduces interfaces for all Attributes. The corresponding
> >   implementations have the postfix 'Impl', e.g. TermAttribute and
> >   TermAttributeImpl. AttributeSource now has a factory for creating
> >   the Attribute instances; the default implementation looks for
> >   implementing classes with the postfix 'Impl'. Token now implements
> >   all 6 TokenAttribute interfaces.
> > - new method added to AttributeSource:
> >   addAttributeImpl(AttributeImpl). Using reflection it walks up in the
> >   class hierarchy of the passed in object and finds all interfaces
> >   that the class or superclasses implement and that extend the
> >   Attribute interface. It then adds the interface->instance mappings
> >   to the attribute map for each of the found interfaces.
> > - AttributeImpl now has a default implementation of toString that uses
> >   reflection to print out the values of the attributes in a default
> >   formatting. This makes it a bit easier to implement AttributeImpl,
> >   because toString() was declared abstract before.
> > - Cloning is now done much more efficiently in
> >   captureState. The method figures out which unique AttributeImpl
> >   instances are contained as values in the attributes map, because
> >   those are the ones that need to be cloned. It creates a single
> >   linked list that supports deep cloning (in the inner class
> >   AttributeSource.State). AttributeSource keeps track of when this
> >   state changes, i.e. whenever new attributes are added to the
> >   AttributeSource. Only in that case will captureState recompute the
> >   state, otherwise it will simply clone the precomputed state and
> >   return the clone. restoreState(AttributeSource.State) walks the
> >   linked list and uses the copyTo() method of AttributeImpl to copy
> >   all values over into the attribute that the source stream
> >   (e.g. SinkTokenizer) uses.
> > The cloning performance can be greatly improved if not multiple
> > AttributeImpl instances are used in one TokenStream. A user can
> > e.g. simply add a Token instance to the stream instead of the individual
> > attributes. Or the user could implement a subclass of AttributeImpl that
> > implements exactly the Attribute interfaces needed. I think this
> > should be considered an expert API (addAttributeImpl), as this manual
> > optimization is only needed if cloning performance is crucial. I ran
> > some quick performance tests using Tee/Sink tokenizers (which do
> > cloning) and the performance was roughly 20% faster with the new
> > API. I'll run some more performance tests and post more numbers then.
> > Note also that when we add serialization to the Attributes, e.g. for
> > supporting storing serialized TokenStreams in the index, then the
> > serialization should benefit even significantly more from the new API
> > than cloning.
> > Also, the TokenStream API does not change, except for the removal
> > of the set/getUseNewAPI methods. So the patches in LUCENE-1460
> > should still work.
> > All core tests pass, however, I need to update all the documentation
> > and also add some unit tests for the new AttributeSource
> > functionality. So this patch is not ready to commit yet, but I wanted
> > to post it already for some feedback.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
-- 
- Mark

http://www.lucidimagination.com

Mime
View raw message