opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: Spanish trained models for POS tagging
Date Thu, 12 Apr 2012 16:00:39 GMT
Thanks much! I merged the pull request yesterday.

On Mon, Apr 9, 2012 at 6:40 PM, Juan Manuel Caicedo Carvajal <
juan@cavorite.com> wrote:

> Hello,
>
> I finally made the pull request that includes the models for the POS
> tagger for Spanish.
>
> I created the models using their original tags and also the universal
> POS tags. For each tag set I trained two models: one using maxent and
> the other using perceptron.
>
> The pull request contains the models and the scripts that I used to train
> them:
>
> https://github.com/utcompling/OpenNLP-Models/pull/1
>
> Cheers,
>
> Juan Manuel Caicedo
>
> On Mon, Feb 13, 2012 at 12:47 PM, Jason Baldridge
> <jasonbaldridge@gmail.com> wrote:
> >
> >
> > On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal
> > <juan@cavorite.com> wrote:
> >>
> >> (Sorry for the late reply)
> >>
> >> I just cloned the repository and I'll add the scripts I used to
> >> convert the input files and to train the models. this afternoon I'll
> >> put them together on a pull request.
> >>
> >
> > Great!
> >
> >>
> >> Should we keep a copy of the training data in GitHub? I think it could
> >> be useful for training again the models and it also be helpful in case
> >> that the original files are not available anymore (e.g. 404 errors).
> >> Otherwise, should be enough to include links those files?
> >>
> > It depends on whether it is legal to do so. For example, the Norwegian
> data
> > used to train the models there cannot be distributed. If it is fine to
> have
> > it and the corpus isn't too massive, then it might make sense.
> >
> >
> >>
> >> I also have a script for generating a Maven repository for the models.
> >> The GitHub project could also be used for hosting that repository,
> >> what do you think?
> >>
> >
> > +1 Sounds interesting, so if you want to set that up, it sounds good to
> me.
> >
> > -Jason
> >
> >> On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge
> >> <jasonbaldridge@gmail.com> wrote:
> >> > That's great! Would you be interested in contributing code and/or data
> >> > to
> >> > the OpenNLP Models repo?
> >> >
> >> > https://github.com/utcompling/OpenNLP-Models
> >> >
> >> >
> >> >
> >> > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal
> >> > <juan@cavorite.com> wrote:
> >> >>
> >> >> Hello everyone,
> >> >>
> >> >> I trained POS tagging models for Spanish using the CoNLL data [1].
> >> >>
> >> >> I created two versions using a different model type (percetron and
> >> >> maxent) and I also created versions of the models using the universal
> >> >> Part-of-Speech Tags [2].
> >> >>
> >> >> I uploaded the files to my server, you can read more details here,
> >> >> including the evaluation results:
> >> >>
> >> >> http://cavorite.com/labs/nlp/opennlp-models-es/
> >> >>
> >> >> And the files are here:
> >> >>
> >> >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
> >> >>
> >> >>
> >> >> Feel free to host them on the OpenNLP website and do not hesitate to
> >> >> send me your questions or comments.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Juan Manuel Caicedo
> >> >>
> >> >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
> >> >> [2] http://code.google.com/p/universal-pos-tags/
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Jason Baldridge
> >> > Associate Professor, Department of Linguistics
> >> > The University of Texas at Austin
> >> > http://www.jasonbaldridge.com
> >> > http://twitter.com/jasonbaldridge
> >> >
> >> >
> >
> >
> >
> >
> > --
> > Jason Baldridge
> > Associate Professor, Department of Linguistics
> > The University of Texas at Austin
> > http://www.jasonbaldridge.com
> > http://twitter.com/jasonbaldridge
> >
> >
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message