opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <>
Subject Re: Thread-safe versions of some of the tools
Date Thu, 12 Jan 2017 12:07:17 GMT
On 12/01/2017 10:20, Joern Kottmann wrote:
> The POSTagger interface just grew over time and I am not sure it is
> actually that great. Today there are different ways of returning
> probabilities.
> - tag and probs (this is POSTaggerME only and not in the interface)
> - tokKSequences, which returns multiple possible Sequence objects
> Wouldn't it make sense to unify that? One new method which takes a sentence
> and returns the best Sequence.
> What do you think about a thread safe wrapper object for the POS Tagger? If
> you want it thread safe you instantiate that one, internally it could use
> ThreadLocal to switch between multiple POSTaggerME instances. Since
> POSTagger (the interface) can be thread safe as it is now this seems to be
> a rather simple change.

I agree, that's even better.

I'd be happy to do that. What's the procedure nowadays? Create a pull 
request on GH, which is then reviewed?


> With your proposed solution a user would have to write
> tagger.getThreadLocal().tag(...)
> instead of
> tagger.tag(...)
> Jörn
> On Thu, Jan 12, 2017 at 9:48 AM, Thilo Goetz <> wrote:
>> On 11/01/2017 22:51, Joern Kottmann wrote:
>>> On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote:
>>>> in a recent project, I was using SentenceDetectorME, TokenizerME and
>>>> POSTaggerME. It turns out that none of those is thread safe. This is
>>>> because the classification probabilities for the last tag() call
>>>> (for
>>>> example) are stored in a member variable and can be retrieved by a
>>>> separate API call.
>>> The POSTagger already has the Sequence object to return the result
>>> with probabilties. If we would introduce a new method we can probably
>>> just deprecate the method to retrieve the probs.
>>> Should be a minor change to have an interface that can be thread safe.
>>> [...]
>> I don't want to muddy the waters, but I had another idea: we could also
>> add a getThreadLocal() method to the tools we want. You would create a
>> POSTaggerME (for example) like always, and if you needed a per thread
>> version, you could then call getThreadLocal(), which would give you another
>> POSTaggerME with the same model, per thread. The advantage as I see it is
>> that the API extension would be conservative (just one method added), and
>> getting the probabilities would continue to work as before because you have
>> one instance per thread.
>> Does that make sense? I'm not sure I'm explaining this in the best
>> possible manner...
>> --Thilo

View raw message