lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Krallinger <>
Subject Lucene usage for BioCreative challenge task
Date Thu, 20 Dec 2012 14:55:52 GMT
CALL FOR PARTICIPATION: CHEMDNER task: Chemical compound and drug name
recognition task
( )

The CHEMDNER task (part of The BioCreative IV competition) is a community
challenge on named entity recognition of chemical compounds.

 Lucene was used as a technology by teams that participated in previous
BioCreative challenges for indexing names (such as genes and proteins) in
articles or to store document vectors are stored as a Lucene index for text
classification purposes. We expect that Lucene will be a useful resource
also for the chemical compound recognition and indexing task. We thus
encourage Lucene users to participate at the chemical compound named entity
recognition task of BioCreative IV.

The goal of this task is to promote the implementation of systems that are
able to detect mentions in text of chemical compounds and drugs. The
recognition of chemical entities is also crucial for other subsequent text
processing strategies, such as detection of drug-protein interactions,
adverse effects of chemical compounds or the extraction of pathway and
metabolic reaction relations. A range of different methods have been
explored for the recognition of chemical compound mentions including
machine learning based approaches, rule-based systems and different types
of dictionary-lookup strategies. The Weka framework has been successfully
explored by several participating teams for previous biomedical text mining
task posed in the context of the BioCreative challenge.

We foresee a considerable interest in the result of this task by the
NLP/text mining community on one side, as well as by the bioinformatics,
drug discovery/biomedicine and chemoinformatics communities on the other
side. As has been the case in previous BioCreative efforts (resulting in
high impact papers in the field), we expect that successful participants
will have the opportunity to publish their system descriptions in a journal

The CHEMDNER is one of the tracks posed at the BioCreative IV community
challenge (

We invite participants to submit results for the CHEMDNER task providing
predictions for one or both of the following subtasks:

a) Given a set of documents, return for each of them a ranked list of
chemical entities described within each of these documents [Chemical
document indexing sub-task]

b) Provide for a given document the start and end indices corresponding to
all the chemical entities mentioned in this document [Chemical entity
mention recognition sub-task].

For these two tasks the organizers will release training and test data
collections. The task organizers will provide details on the used
annotation guidelines; define a list of criteria for relevant chemical
compound entity types as well as selection of documents for annotation.

Teams can participate in the CHEMDNER task by registering for track 2 of
BioCreative IV. You can register additionally for other tracks too. To
register your team, go to the following page that provides more detailed

Mailing list and contact information:
You can post questions related to the CHEMDNER task to the BioCreative
mailing list. To register for the BioCreative mailing list, please visit
the following page:

CHEMDNER is part of the BioCreative evaluation effort. The BioCreative
Organizing Committee will host the BioCreative IV Challenge evaluation
workshop ( at NCBI,
National Institutes of Health, Bethesda, Maryland, on October 7-9, 2013

Martin Krallinger, Spanish National Cancer Research Center (CNIO)
Obdulia Rabal, University of Navarra, Spain
Julen Oyarzabal, University of Navarra, Spain
Alfonso Valencia, Spanish National Cancer Research Center (CNIO)

- Vazquez, M., Krallinger, M., Leitner, F., & Valencia, A. (2011). Text
Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.
Molecular Informatics, 30(6-7), 506-519.
- Krallinger M, et al. The Protein-Protein Interaction tasks of BioCreative
III: classification/ranking of articles and linking bio-ontology concepts
to full text. BMC Bioinformatics. 2011;12 Suppl 8:S3
- Corbett, P., Batchelor, C., & Teufel, S. (2007). Annotation of chemical
named entities. BioNLP 2007: Biological, translational, and clinical
language processing, 57-64.
- Klinger, R., Kolářik, C., Fluck, J., Hofmann-Apitius, M., & Friedrich, C.
M. (2008). Detection of IUPAC and IUPAC-like chemical names.
Bioinformatics, 24(13), i268-i276.
- Hettne, K. M., Stierum, R. H., Schuemie, M. J., Hendriksen, P. J.,
Schijvenaars, B. J., Mulligen, E. M. V., ... & Kors, J. A. (2009). A
dictionary to identify small molecules and drugs in free text.
Bioinformatics, 25(22), 2983-2991.
- Yeh, A., Morgan, A., Colosimo, M., & Hirschman, L. (2005). BioCreAtIvE
task 1A: gene mention finding evaluation. BMC bioinformatics, 6(Suppl 1),
- Smith, L., Tanabe, L. K., Ando, R. J., Kuo, C. J., Chung, I. F., Hsu, C.
N., ... & Wilbur, W. J. (2008). Overview of BioCreative II gene mention
recognition. Genome Biology, 9(Suppl 2), S2.

Martin Krallinger
Structural Computational Biology Group
Structural Biology and BioComputing Programme
Spanish National Cancer Research Centre (CNIO)


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message