opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: Codec classes (BioCodec and BilouCodec)
Date Wed, 15 Mar 2017 08:53:06 GMT
Yes, it would be good to refactor that, adding a base class sounds good.

Jörn

On Tue, Mar 14, 2017 at 11:08 PM, Peter Thygesen <thygesen.opennlp@gmail.com
> wrote:

> Thx for the reply.
>
> BTW
> BioCodec also contains a static method: String extractNameType(...) this
> method should be move out. Perhaps to a base class called Codec (or
> CodecBase... or how you normally named base classes) The method is also
> called by BilouCodec, which then is calling directly to BioCodec... (which
> smells)
>
> Perhaps another refactoring task?
>
> /Peter
>
> 2017-03-14 12:48 GMT+01:00 Joern Kottmann <kottmann@gmail.com>:
>
> > On Sun, Mar 12, 2017 at 11:19 PM, Peter Thygesen <
> > thygesen.opennlp@gmail.com
> > > wrote:
> >
> > > Hi OpenNLP Developers
> > >
> > >
> > >
> > > After I some weeks ago added a PR for NameFinderSequenceValidator
> test, I
> > > spent some time looking into the BilouNameFinderSequenceValidator and
> > the
> > > Codec classes, trying to understand how they work and I wrote some more
> > > tests that were missing (OPENNLP-1000)
> > >
> > >
> > >
> > > I have a few comments and questions I hope someone here can answer.
> > >
> > >
> > >
> > > General question:
> > >
> > > Why are NameFinder.START and NameFinder.CONTINUE used I the codec
> classes
> > > and namefindervalidator classes when BioCodec/BilouCodec have their own
> > > tags?
> > >
> > > NameFinder should not have them after my opinion. They belong to the
> > Codec
> > > classes.
> > >
> >
> > Yes, that is how it should be. The used to be part of the NameFinder but
> > then were moved to their own class. It would be good to refactor that.
> >
> >
> > >
> > > The BIO doesn’t really match the “Start”, “Continue”, “Other”
tags..
> even
> > > though its “similar”. And for the BILOU its mixed opennlp+bilou:
> > > Start,Continue,Last,Unit,Other… Some what messy I think. But not BIG
> > thing
> > > for me, and yes BILOU also have many names.. (like BMEWO)
> > >
> >
> >
> > The tags were never renamed because that would break backward
> compatibility
> > with existing models.
> >
> >
> >
> > >
> > > Class: BioCodec
> > >
> > > Method: areOutcomesCompatible(String[] outcomes)
> > >
> > >
> > >
> > > I assume that this method is checking that tags in the model are
> > compatible
> > > with opennlp tags (from the codec). And that it receives an array for
> > > unique tags in the model.
> > >
> > > Is this true?
> > >
> > >
> > Yes
> >
> >
> > >
> > > The comment in the beginning of the method says: ”… To validate the
> model
> > > we check if we have one outcome named ”other”…
> > >
> > > However, this is not directly checked. The current test will pass
> > outcomes
> > > without “other” if the outcomes have other valid tags.
> > >
> > > So, can a model be compatible without an “other” tag?
> > >
> > >
> > > E.g.
> > >
> > > Are outcomes=[“person-start”], [“person-start”, “person-continue”]
or
> > > [“person-start”, “location-start”] compatible models?
> > >
> > >
> > This could be the case, in practice this all probably doesn't matter that
> > much.
> > This check should fail if a user somehow ends up with a model which
> doesn't
> > come from the name finder, e.g. a  pos tagger maxent model shouldn't be
> > validated as valid by this method.
> >
> >
> > >
> > > The method could use Set instead of Lists (yes.. the lists are probably
> > > small for single class models)
> > >
> > >
> > >
> > > Class: BilouCodec
> > >
> > > Method: areOutcomesCompatiple(String[] outcomes) implementation is
> > missing.
> > >
> > > I made one which you can have and review.
> > >
> > >
> > >
> > This would be great if you could contribute it.
> >
> >
> >
> > >
> > > Like the BioCodec I need to know whether a model is compatible without
> > the
> > > “other” tag.
> > >
> > >
> > I think having a valid model with out other is ok.
> >
> >
> >
> > >
> > > Class: BilouNameFinderValidator
> > >
> > > I have written a unit test (OPENNLP-1000) for this class and it shows
> its
> > > buggy. I will soon push a PR for this.
> > >
> > > Failed tests: (should not be valid)
> > >
> > > “Start” followed by “Start”
> > >
> > > “Start” followed by “Other”
> > >
> > > “Start” followed by “Unit”
> > >
> > > “Continue” followed by “Start”
> > >
> > > “Continue” followed by “Unit”
> > >
> > > “Last” followed by “Continue”
> > >
> > > “Last” followed by “Last”
> > >
> > >
> > I reviewed your PR and this looks good. I need to run the evaluation
> tests
> > once with that PR and then we can merge it.
> >
> >
> >
> > >
> > > I can create some JIRA and PR for my test if you like. (I hope you like
> > :-)
> > >
> > >
> >
> > Please do that and send us more PRs, all you contributions are very
> welcome
> > and especially writing tests helps us a lot to improve code quality.
> >
> > Jörn
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message