opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: new tool training
Date Thu, 27 Oct 2016 19:15:35 GMT
On Thu, 2016-10-27 at 15:49 +0000, Russ, Daniel (NIH/CIT) [E] wrote:
> 
> Comment 2:
> Do you have a preference where the variable should go?  I think
> AbstractTrainer is the appropriate place for PSF variable dealing
> with ALL trainers, so Threads_(P/D) should be there.  I would remove
> and refactor out of TrainingParams.

TrainingParameters is the class which is parsing the passed in params
file. There is has to know about "Algorithm" all the others are
specific to the trainer implementation.

I think AbstractTrainer is probably a good place for PSF variables
which deal with many/most trainers.


> Comment 3:
> Right I want to change the dataindexer.
> 
> So I have multiple models that classify data (Job descriptions) into
> Occupational Codes.  I know what the codes are aprori, and even if
> they are not in the training data, I need to make sure that there is
> SOME probability for the codes.  More importantly for each job
> description, I need to compare the probabilities returned for each
> output.  By forcing the output indices to have the same values, I can
> quickly compare them without re-mapping the output.
> 
> I tried to extend OnePassDataIndex, but the indexing occurs during
> object construction, so I cannot set the known outputs before
> indexing occurs.  
> 
> Of course I would not need the getDataIndexer() method,  but it is
> defined in the Abstract class, why not in the Interface


The thing is that with the current interface we can support
implementations which don't use the Data Indexer. This can be the case
when it relies on external machine learning libraries. Since 1.6.0 we
have plugable ml support.

I looked closer now, the getDataIndexer is a factory method for the
Data Indexer. Maybe it would make sense to allow to specify a custom
class for data indexing as part of the training parameters? Then the
trainer who use the Data Indexer can just support that mechanism.

Jörn

Mime
View raw message