lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
Date Fri, 02 May 2014 16:00:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987841#comment-13987841
] 

Uwe Schindler commented on LUCENE-5634:
---------------------------------------

Patch looks fine. I was afraid of complexity, but that looks quite good. I am not sure about
backwards compatibility issues, but implementing your own IndexableField instance is still
very expert. With Java 8 we could handle that with default interface methods (LOOOOOOL).

The current patch is fine for the 2 special cases, although its a bit risky, if we add new
"settings" to NTS or change its API (we should have equals...). Maybe in LUCENE-5605 we can
improve the check. If we pass FieldType directly to NTS and NRQ, we can handle the whole thing
by comparing the field type and not rely on crazy internals like precStep.

It would be great if we could in the future remove the ThreadLocal from Analyzer, too - by
using the same trick. Unfortunately with the current contract on TokenStream its hard to compare,
unless we have a well-defined TokenStream#equals(). Ideally TokenStream#equals() should compare
the "settings" of the stream and its inputs (for Filters), but that is too advanced for the
simple 2 cases.

Another solution for this would be to have some "holder" around the TokenStream thats cached
and provides hashcode/equals. By that a Field could determine better if its his own tokenstream
(e.g. by putting a refernce to its field type into the holder).

> Reuse TokenStream instances in Field
> ------------------------------------
>
>                 Key: LUCENE-5634
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5634
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5634.patch, LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message