lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
Date Mon, 10 Aug 2009 13:50:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741319#action_12741319
] 

Shai Erera commented on LUCENE-1794:
------------------------------------

Robert - wouldn't it make sense to pull SavedStreams (maybe call it ReusableStreams?) up to
Analyzer, and have all the extensions use it? I couldn't help but notice that this code is
duplicated in all the Analyzers.

Also, and I don't know if it's a matter for a different issue - the fact that reusableTokenStream
accepts a field name is misleading. On one hand, it makes you think you can ask for a.rts("a)
and a.rts("b") safely, but on the other it is documented to be not that safe (i.e., don't
call this method if you need more than one token stream from an analyzer at the same time).

I don't know how to solve it best - I'd like to have a tokenStream method that accepts the
field name, and that I can get a reused token stream, for that field name. But I also would
like to have a method that I can call "get a reusable token stream" and "I don't care which
field it is". So maybe have two variants:
# reusableTokenStream(Reader reader)
# reusableTokenStream(String field, Reader reader)
This is kind of related to LUCENE-1678, as I think we'd like tokenStream to return a reused
one, but maybe having a tokenStream which always returns a new one, and a reusableTokenStream
(w/o a field) which reuses a stream (maybe the 'default' stream), would be good.

What do you think?

> implement reusableTokenStream for all contrib analyzers
> -------------------------------------------------------
>
>                 Key: LUCENE-1794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1794.patch, LUCENE-1794.patch
>
>
> most contrib analyzers do not have an impl for reusableTokenStream
> regardless of how expensive the back compat reflection is for indexing speed, I think
we should do this to mitigate any performance costs. hey, overall it might even be an improvement!
> the back compat code for non-final analyzers is already in place so this is easy money
in my opinion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message