opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Manuel Caicedo Carvajal <j...@cavorite.com>
Subject Re: Spanish trained models for POS tagging
Date Mon, 09 Apr 2012 23:40:05 GMT
Hello,

I finally made the pull request that includes the models for the POS
tagger for Spanish.

I created the models using their original tags and also the universal
POS tags. For each tag set I trained two models: one using maxent and
the other using perceptron.

The pull request contains the models and the scripts that I used to train them:

https://github.com/utcompling/OpenNLP-Models/pull/1

Cheers,

Juan Manuel Caicedo

On Mon, Feb 13, 2012 at 12:47 PM, Jason Baldridge
<jasonbaldridge@gmail.com> wrote:
>
>
> On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal
> <juan@cavorite.com> wrote:
>>
>> (Sorry for the late reply)
>>
>> I just cloned the repository and I'll add the scripts I used to
>> convert the input files and to train the models. this afternoon I'll
>> put them together on a pull request.
>>
>
> Great!
>
>>
>> Should we keep a copy of the training data in GitHub? I think it could
>> be useful for training again the models and it also be helpful in case
>> that the original files are not available anymore (e.g. 404 errors).
>> Otherwise, should be enough to include links those files?
>>
> It depends on whether it is legal to do so. For example, the Norwegian data
> used to train the models there cannot be distributed. If it is fine to have
> it and the corpus isn't too massive, then it might make sense.
>
>
>>
>> I also have a script for generating a Maven repository for the models.
>> The GitHub project could also be used for hosting that repository,
>> what do you think?
>>
>
> +1 Sounds interesting, so if you want to set that up, it sounds good to me.
>
> -Jason
>
>> On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge
>> <jasonbaldridge@gmail.com> wrote:
>> > That's great! Would you be interested in contributing code and/or data
>> > to
>> > the OpenNLP Models repo?
>> >
>> > https://github.com/utcompling/OpenNLP-Models
>> >
>> >
>> >
>> > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal
>> > <juan@cavorite.com> wrote:
>> >>
>> >> Hello everyone,
>> >>
>> >> I trained POS tagging models for Spanish using the CoNLL data [1].
>> >>
>> >> I created two versions using a different model type (percetron and
>> >> maxent) and I also created versions of the models using the universal
>> >> Part-of-Speech Tags [2].
>> >>
>> >> I uploaded the files to my server, you can read more details here,
>> >> including the evaluation results:
>> >>
>> >> http://cavorite.com/labs/nlp/opennlp-models-es/
>> >>
>> >> And the files are here:
>> >>
>> >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
>> >>
>> >>
>> >> Feel free to host them on the OpenNLP website and do not hesitate to
>> >> send me your questions or comments.
>> >>
>> >> Cheers,
>> >>
>> >> Juan Manuel Caicedo
>> >>
>> >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
>> >> [2] http://code.google.com/p/universal-pos-tags/
>> >
>> >
>> >
>> >
>> > --
>> > Jason Baldridge
>> > Associate Professor, Department of Linguistics
>> > The University of Texas at Austin
>> > http://www.jasonbaldridge.com
>> > http://twitter.com/jasonbaldridge
>> >
>> >
>
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>
>

Mime
View raw message