opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: Fwd: LREC 2012 Workshop on Language Resource Merging - Extended Deadline to Feb. 22, 2012
Date Fri, 24 Feb 2012 03:01:23 GMT
+1

On Wed, Feb 22, 2012 at 4:21 PM, Joern Kottmann <kottmann@gmail.com> wrote:

> The code we could add to the formats package and ship
> with OpenNLP directly.
>
> +1 to do that
>
> Jörn
>
> On Wed, Feb 22, 2012 at 10:53 PM, Jason Baldridge <
> jasonbaldridge@gmail.com> wrote:
>
>> The OANC is great, and I'm glad to hear that training those models on it
>> work well. It would certainly be a good idea to look at the MASK subset
>> and
>> see how things work out with that. (Also, just to get a sense of the size
>> of it.)
>>
>> If you are up for it, would you be interested in adding code and data to
>> train on OANC for the OpenNLP-Models git repo?
>>
>> https://github.com/utcompling/OpenNLP-Models
>>
>> -Jason
>>
>> On Wed, Feb 22, 2012 at 3:22 AM, Katrin Tomanek
>> <katrin.tomanek@averbis.com>wrote:
>>
>> >
>> >
>> >
>> > Hi,
>> >
>> > as for corpora we could be using to train freely available models for
>> > opennlp on: I have tested OANC (the "open" section of the american
>> > national corpus, http://www.**americannationalcorpus.org/**OANC<
>> http://www.americannationalcorpus.org/OANC>
>>
>> > ).
>> >
>> > Although the OANC is automatically tagged, I obtain quite OK results
>> > (sentence splitting, tokenization, POS Tagging, NP Chunking).
>> >
>> > So, maybe we can provide models trained on OANC for download?
>> >
>> > Moreover: Nancy Ide just told me, there was a subset of the OANC (called
>> > MASK) which was manually validated and is also freely available. This
>> > might be even better.
>> >
>> > What do you think?
>> >
>> > Best,
>> > Katrin
>> >
>> >
>> >
>> > On 02/20/2012 11:01 PM, Jason Baldridge wrote:
>> >
>> >> Might be some things we should look at wrt to our goals of creating
>> >> annotated resources. -j
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: ELRA ELDA Information<info@elda.org>
>> >> Date: Wed, Feb 15, 2012 at 7:36 AM
>> >> Subject: LREC 2012 Workshop on Language Resource Merging - Extended
>> >> Deadline to Feb. 22, 2012
>> >>
>> >
>> >  To:
>> >>
>> >>
>> >> Call for Papers
>> >> LREC 2012 Workshop on: Language Resource Merging
>> >> 22 May 2012 – Afternoon Session
>> >>
>> >> EXTENDED Submission deadline: 22 FEBRUARY
>> >>
>> >> CONTEXT
>> >> The availability of adequate language resources has been a well-known
>> >> bottleneck for most high-level language technology applications, e.g.
>> >> Machine Translation, parsing, and Information Extraction, for at least
>> 15
>> >> years, and the impact of the bottleneck is becoming all the more
>> apparent
>> >> with the availability of higher computational power and massive
>> storage,
>> >> since modern language technologies are capable of using far more
>> resources
>> >> than the community produces. The present landscape is characterized by
>> the
>> >> existence of numerous scattered resources, many of which have differing
>> >> levels of coverage, types of information and granularity. Taken
>> >> singularly,
>> >> existing resources do not have sufficient coverage, quality or richness
>> >> for
>> >> robust large-scale applications, and yet they contain valuable
>> information
>> >> (Monachini et al. 2004 and 2006; Soria et al. 2006; Molinero, Sagot and
>> >> Nicolas 2009; Necsulescu et al. 2011). Differing technology or
>> application
>> >> requirements, ignorance of the existence of certain resources, and
>> >> difficulties in accessing and using them, has led to the proliferation
>> of
>> >> multiple, unconnected resources that, if merged, could constitute a
>> much
>> >> richer repository of information augmenting either coverage or
>> >> granularity,
>> >> or both, and consequently multiplying the number of potential language
>> >> technology applications. Merging, combining and/or compiling larger
>> >> resources from existing ones thus appears to be a promising direction
>> to
>> >> take.
>> >> The re-use and merging of existing resources is not altogether unknown.
>> >> For
>> >> example, WordNet (Fellbaum, 1998) has been successfully reused in a
>> >> variety
>> >> of applications. But this is the exception rather than the rule; in
>> fact,
>> >> merging, and enhancing existing resources is uncommon, probably
>> because it
>> >> is by no means a trivial task given the profound differences in
>> formats,
>> >> formalisms, metadata, and linguistic assumptions.
>> >> The language resource landscape is on the brink of a large change,
>> >> however.
>> >> With the proliferation of accessible metadata catalogues, and resource
>> >> repositories (such as the new META-SHARE (
>> http://www.meta-net.eu/meta-***
>> >> * <http://www.meta-net.eu/meta-**>
>> >> share<http://www.meta-net.eu/**meta-share<
>> http://www.meta-net.eu/meta-share>>)
>>
>> >> infrastructure), a potentially
>> >> large number of existing resources will be more easily located,
>> accessed
>> >> and downloaded. Also, with the advent of distributed platforms for the
>> >> automatic production of language resources, such as PANACEA (
>> >> http://www.panacea-lr.eu/), new language resources and linguistic
>> >> information capable of being integrated into those resources will be
>> >> produced more easily and at a lower cost. Thus, it is likely that
>> >> researchers and application developers will seek out resources already
>> >> available before developing new, costly ones, and will require methods
>> for
>> >> merging/combining various resources and adapting them to their specific
>> >> needs.
>> >> Up to the present day, most resource merging has been done manually,
>> with
>> >> only a small number of attempts reported in the literature towards
>> >> (semi-)automatic merging of resources (Crouch&  King 2005; Pustejovsky
>> et
>> >> al. 2005; Molinero, Sagot and Nicolas 2009; Necsulescu et al. 2011). In
>> >> order to take a further step towards the scenario depicted above, in
>> which
>> >> resource merging and enhancing is a reliable and accessible first step
>> for
>> >> researchers and application developers, experience and best practices
>> must
>> >> be shared and discussed, as this will help the whole community avoid
>> any
>> >> waste of time and resources.
>> >>
>> >> AIMS OF THE WORKSHOP
>> >> This half-day workshop is meant to be part of a series of meetings
>> >> constituting an ongoing forum for sharing and evaluating the results of
>> >> different methods and systems for the automatic production of language
>> >> resources (the first one was the LREC 2010 Workshop on Methods for the
>> >> Automatic Production of Language Resources and their Evaluation
>> Methods).
>> >> The main focus of this workshop is on (semi-)automatic means of merging
>> >> language resources, such as lexicons, corpora and grammars. Merging
>> makes
>> >> it possible to re-use, adapt, and enhance existing resources, alongside
>> >> new, automatically created ones, with the goal of reducing the manual
>> >> intervention required in language resource production, and thus
>> ultimately
>> >> production costs.
>> >>
>> >> WORKSHOP TOPICS
>> >> The topics of the workshop are related to best practices, methods,
>> >> techniques and experimental results regarding the merging of various
>> types
>> >> of language resources, such as lexicons and corpora, especially in
>> support
>> >> of language technology applications. In particular, new methods for
>> >> automatic merging with a view towards reducing human intervention will
>> be
>> >> most welcome.
>> >> Topics for submission include, but are not limited to:
>> >> Experiments on (semi-)automatic merging of automatically produced
>> >> resources
>> >> Experiments on the merging of two or more existing resources containing
>> >> the
>> >> same or different levels of linguistic information
>> >> Studies or experiments on merging resources at different levels of
>> >> granularity (corpora, lexicons, grammars)
>> >> Studies or experiments on unifying, mapping or converting encoding
>> formats
>> >> Comparison between different resources and mapping algorithms to
>> provide
>> >> desired merging
>> >> Use of linguistic information from different sources in high-level
>> >> language
>> >> applications
>> >> Use of new, merged language resources in language technology
>> applications
>> >>
>> >> WORKSHOP WEBSITE:
>> >> http://panacea-lr.eu/en/news/****project/2011/12/19/lrec-2012-****<
>> http://panacea-lr.eu/en/news/**project/2011/12/19/lrec-2012-**>
>> >> merging-lr-workshop/<http://**panacea-lr.eu/en/news/project/**
>> >> 2011/12/19/lrec-2012-merging-**lr-workshop/<
>> http://panacea-lr.eu/en/news/project/2011/12/19/lrec-2012-merging-lr-workshop/
>> >
>>
>> >> >
>> >>
>> >> SUBMISSIONS
>> >> Interested participants must submit a preliminary paper of about 4-6
>> pages
>> >> including references (between 2000-2500 words). For the submission
>> please
>> >> use the online form on START LREC Conference Manager at:
>> >> https://www.softconf.com/****lrec2012/MergingLR2012/<
>> https://www.softconf.com/**lrec2012/MergingLR2012/>
>> >> <https:**//www.softconf.com/lrec2012/**MergingLR2012/<
>> https://www.softconf.com/lrec2012/MergingLR2012/>
>>
>> >> >
>> >> When submitting a paper from the START page, authors will be asked to
>> >> provide essential information about resources (in a broad sense, i.e.
>> also
>> >> technologies, standards, evaluation kits, etc.) that have been used for
>> >> the
>> >> work described in the paper or are a new result of your research.
>> >> For further information on this new initiative, please refer to
>> >> http://www.lrec-conf.org/****lrec2012/?LRE-Map-2012<
>> http://www.lrec-conf.org/**lrec2012/?LRE-Map-2012>
>> >> <http://**www.lrec-conf.org/lrec2012/?**LRE-Map-2012<
>> http://www.lrec-conf.org/lrec2012/?LRE-Map-2012>
>>
>> >> >
>> >> Papers will be peer-reviewed by the workshop Program Committee.
>> >>
>> >> IMPORTANT DATES
>> >> Deadline for paper submission: 22 February 2012 (23:59 CET +1)
>> >> **EXTENDED**
>> >> Notification of acceptance: 15 March 2012
>> >> Submission of camera-ready version of papers: 31 March 2012
>> >> Workshop date: 22 May 2012 – Afternoon Session
>> >>
>> >> ORGANIZING COMMITTEE
>> >> Núria Bel, UPF, Barcelona, Spain
>> >> Maria Gavrilidou, ILSP-“Athena”, Athens, Greece,
>> >> Monica Monachini, CNR-ILC, Pisa, Italy
>> >> Valeria Quochi, CNR-ILC, Pisa, Italy
>> >> Laura Rimell, University of Cambridge, UK
>> >>
>> >> Contacts
>> >> lrec12_workshop_merging@ilc.****cnr.it <http://cnr.it
>> ><lrec12_workshop_**
>> >> merging@ilc.cnr.it <lrec12_workshop_merging@ilc.cnr.it>>
>>
>> >>
>> >> PROGRAMME COMMITTEE:
>> >> Victoria Arranz, ELDA, Paris, France
>> >> Paul Buitelaaar, National University of Ireland, Galway, Ireland
>> >> Nicoletta Calzolari, CNR-ILC, Pisa, Italy
>> >> Olivier Hamon, ELDA, Paris, France
>> >> Aleš Horák, Masaryk University, Brno, Czech Republic
>> >> Nancy Ide, Vassar College, Mass. USA
>> >> Bernardo Magnini, FBK, Trento, Italy
>> >> Paola Monachesi, Utrecht University, Utrecht, The Netherlands
>> >> Jan Odijk, , Utrecht University, Utrecht, The Netherlands
>> >> Muntsa Padró, IULA, Barcellona, Spain
>> >> Karel Pala, Masaryk University, Brno, Czech Republic
>> >> Thierry Poibeau University of Cambridge, UK and CNRS, Paris, France
>> >> Benoît Sagot, INRIA, Paris, France
>> >> Kiril Simov, Bulgarian Academy of Sciences, Sofia, Bulgaria
>> >> Claudia Soria, CNR-ILC, Pisa, Italy
>> >> Maurizio Tesconi, CNR-IIT, Pisa
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> > --
>> > Dr. Katrin Tomanek
>> > Averbis GmbH
>> > Tennenbacher Strasse 11
>> > D-79106 Freiburg
>> >
>> > Fon: +49 (0) 761 - 203 97696
>> > Fax: +49 (0) 761 - 203 97694
>> > E-Mail: katrin.tomanek@averbis.com
>> >
>> > Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
>> > Sitz der Gesellschaft: Freiburg i. Br.
>> > AG Freiburg i. Br., HRB 701080
>> >
>>
>>
>>
>> --
>> Jason Baldridge
>> Associate Professor, Department of Linguistics
>> The University of Texas at Austin
>> http://www.jasonbaldridge.com
>> http://twitter.com/jasonbaldridge
>>
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message