lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
Date Thu, 08 Apr 2010 11:18:36 GMT


Robert Muir commented on LUCENE-2384:

If tokenizers like StandardTokenizer just end out reading things into ram anyway, we should
remove Reader from the Tokenizer interface.

supporting reader instead of simply tokenizing the entire doc causes our tokenizers to be
very very complex (see CharTokenizer).
It would be nice to remove this complexity, if the objective doesn't really work anyway.

> Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
> -------------------------------------------------------------
>                 Key: LUCENE-2384
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Analysis
>    Affects Versions: 3.0.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1
> When indexing large documents, the lexer buffer may stay large forever. This sub-issue
resets the lexer buffer back to the default on reset(Reader).
> This is done on the enclosing issue.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message