opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Portuguese
Date Mon, 05 Dec 2011 21:33:47 GMT
One thing that I think might be nice moving forward is to develop a robust
set of models and test sets that involve at least two languages. I'm
thinking Portuguese would be a good one in addition to English since:

   - several of us speak it (I'm a non-native speaker who lived in Brazil
   for a couple of years -- who else?)
   - there are truly free annotated resources for it:
   http://www.linguateca.pt/
   - it's pretty darn widely spoken in the world, both as first and second
   language

Doing something like this would help push the annotation effort forward as
well. E.g. we commit to providing support for a language means we need to
get at least some annotations going for each level of analysis we want to
support, and that will in turn spur development on the tool that Jorn has
been putting together.

Jason

-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message