opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <>
Subject Re: Thread-safe versions of some of the tools
Date Thu, 12 Jan 2017 08:48:48 GMT
On 11/01/2017 22:51, Joern Kottmann wrote:
> On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote:
>> in a recent project, I was using SentenceDetectorME, TokenizerME and
>> POSTaggerME. It turns out that none of those is thread safe. This is
>> because the classification probabilities for the last tag() call
>> (for
>> example) are stored in a member variable and can be retrieved by a
>> separate API call.
> The POSTagger already has the Sequence object to return the result
> with probabilties. If we would introduce a new method we can probably
> just deprecate the method to retrieve the probs.
> Should be a minor change to have an interface that can be thread safe.
I don't want to muddy the waters, but I had another idea: we could also 
add a getThreadLocal() method to the tools we want. You would create a 
POSTaggerME (for example) like always, and if you needed a per thread 
version, you could then call getThreadLocal(), which would give you 
another POSTaggerME with the same model, per thread. The advantage as I 
see it is that the API extension would be conservative (just one method 
added), and getting the probabilities would continue to work as before 
because you have one instance per thread.

Does that make sense? I'm not sure I'm explaining this in the best 
possible manner...


View raw message