lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1333) Token implementation needs improvements
Date Wed, 13 Aug 2008 11:14:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622168#action_12622168
] 

Grant Ingersoll commented on LUCENE-1333:
-----------------------------------------

My point is it is already cloned when it is added to the list, so now it is being cloned twice,
and I think the second clone is extraneous.  In other words, it already is returning a clone
from next.  The only way that Token could be changed is by doing it via the getTokens() method,
which would have to be done outside of the Analysis process, which I would argue does not
violate the reusability process.

Cloning Tokens is not cheap, as I recall.  In fact, I seem to recall testing that it was cheaper
to do a new.  Now, maybe that is fixed in this issue, but I am not sure.  

> Token implementation needs improvements
> ---------------------------------------
>
>                 Key: LUCENE-1333
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1333
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 2.3.1
>         Environment: All
>            Reporter: DM Smith
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1333-analysis.patch, LUCENE-1333-analyzers.patch, LUCENE-1333-core.patch,
LUCENE-1333-highlighter.patch, LUCENE-1333-instantiated.patch, LUCENE-1333-lucli.patch, LUCENE-1333-memory.patch,
LUCENE-1333-miscellaneous.patch, LUCENE-1333-queries.patch, LUCENE-1333-snowball.patch, LUCENE-1333-wikipedia.patch,
LUCENE-1333-wordnet.patch, LUCENE-1333-xml-query-parser.patch, LUCENE-1333.patch, LUCENE-1333.patch,
LUCENE-1333.patch, LUCENE-1333.patch, LUCENE-1333.patch, LUCENE-1333.patch, LUCENE-1333.patch,
LUCENE-1333a.txt
>
>
> This was discussed in the thread (not sure which place is best to reference so here are
two):
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200805.mbox/%3C21F67CC2-EBB4-48A0-894E-FBA4AECC0D50@gmail.com%3E
> or to see it all at once:
> http://www.gossamer-threads.com/lists/lucene/java-dev/62851
> Issues:
> 1. JavaDoc is insufficient, leading one to read the code to figure out how to use the
class.
> 2. Deprecations are incomplete. The constructors that take String as an argument and
the methods that take and/or return String should *all* be deprecated.
> 3. The allocation policy is too aggressive. With large tokens the resulting buffer can
be over-allocated. A less aggressive algorithm would be better. In the thread, the Python
example is good as it is computationally simple.
> 4. The parts of the code that currently use Token's deprecated methods can be upgraded
now rather than waiting for 3.0. As it stands, filter chains that alternate between char[]
and String are sub-optimal. Currently, it is used in core by Query classes. The rest are in
contrib, mostly in analyzers.
> 5. Some internal optimizations can be done with regard to char[] allocation.
> 6. TokenStream has next() and next(Token), next() should be deprecated, so that reuse
is maximized and descendant classes should be rewritten to over-ride next(Token)
> 7. Tokens are often stored as a String in a Term. It would be good to add constructors
that took a Token. This would simplify the use of the two together.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message