opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Zemerick <jzemer...@apache.org>
Subject Re: Multiple models and String.intern
Date Wed, 08 Feb 2017 17:37:46 GMT
Would it be possible to have an option or setting somewhere that determines
if string pooling is used? The option would provide backward compatibility
in case someone has to adjust the -XX:StringTableSize because their
existing models exceed the default JVM limit, and an option would also be
useful for cases when the models were made from different data sources.
(I'm assuming in that case using string pooling would be detrimental to
performance.)

Jeff


On Wed, Feb 8, 2017 at 5:50 AM, Joern Kottmann <kottmann@gmail.com> wrote:

> Hello all,
>
> I often run multiple models in production, often trained on the same data
> but with different types (typical name finder scenario). There could be one
> model to detect person names, and another to detection locations. The
> predicate Strings inside those models are always the same but the models
> can't share the same String instance.
>
> I would like to propose that we use String.intern in the model reader to
> ensure one string is only loaded once.
>
> We tried that in the past and this caused lots of issues with PermGen
> space, but this was improved over time in Java. In Java 8 (on which we
> depend now) this should work properly.
>
> Here is an interesting article about it:
> http://java-performance.info/string-intern-in-java-6-7-8/
>
> Using String.intern will make the model loading a bit slower (we can
> benchmark that).
>
> Jörn
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message