opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aliaksandr Autayeu <>
Subject Re: Portuguese
Date Mon, 05 Dec 2011 23:14:44 GMT
What about more diverse languages? Chinese, Arabic, Russian might be good
examples. In a sense, they will provide wider test coverage. And any of the
above has quite a large audience. Annotated resources and speakers in the
team might be a problem, though.


On Mon, Dec 5, 2011 at 10:33 PM, Jason Baldridge

> One thing that I think might be nice moving forward is to develop a robust
> set of models and test sets that involve at least two languages. I'm
> thinking Portuguese would be a good one in addition to English since:
>   - several of us speak it (I'm a non-native speaker who lived in Brazil
>   for a couple of years -- who else?)
>   - there are truly free annotated resources for it:
>   - it's pretty darn widely spoken in the world, both as first and second
>   language
> Doing something like this would help push the annotation effort forward as
> well. E.g. we commit to providing support for a language means we need to
> get at least some annotations going for each level of analysis we want to
> support, and that will in turn spur development on the tool that Jorn has
> been putting together.
> Jason
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message