lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marvin Humphrey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never "shrink" if it grows too big
Date Wed, 26 Aug 2009 19:31:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748102#action_12748102
] 

Marvin Humphrey commented on LUCENE-1859:
-----------------------------------------

> i fail to see the complexity of adding one method to TermAttribute:

Death by a thousand cuts.  This is one cut.

I wouldn't even add the note to the documentation.  If you emit large tokens,
you have to plan for obscene peak memory usage anyway, and if you're not
prepared for that, you deserve what you get.  Keeping the average down 
doesn't help that.

The only reason to do this is to keep average memory usage down for
the hell of it, and if it goes in, it should be an implementation detail.

> TermAttributeImpl's buffer will never "shrink" if it grows too big
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1859
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1859
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>            Priority: Minor
>
> This was also an issue with Token previously as well
> If a TermAttributeImpl is populated with a very long buffer, it will never be able to
reclaim this memory
> Obviously, it can be argued that Tokenizer's should never emit "large" tokens, however
it seems that the TermAttributeImpl should have a reasonable static "MAX_BUFFER_SIZE" such
that if the term buffer grows bigger than this, it will shrink back down to this size once
the next token smaller than MAX_BUFFER_SIZE is set
> I don't think i have actually encountered issues with this yet, however it seems like
if you have multiple indexing threads, you could end up with a char[Integer.MAX_VALUE] per
thread (in the very worst case scenario)
> perhaps growTermBuffer should have the logic to shrink if the buffer is currently larger
than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message