Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of krallinger.martin@gmail.com
 designates 74.125.82.195 as permitted sender)
MIME-Version: 1.0
Date: Thu, 19 Feb 2015 17:53:21 +0100
Message-ID: 
 <CAMx+MKHPfhgcwBYYRqUxFvGREffK+5N7OVEgLr=7=5iBPf1U5Q@mail.gmail.com>
Subject: CALL FOR PARTICIPATION: CHEMDNER-patents task (Biocreative V)
From: Martin Krallinger <krallinger.martin@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7b5d4e0606d1d7050f73c632

--047d7b5d4e0606d1d7050f73c632
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

CALL FOR PARTICIPATION: CHEMDNER-patents task: Chemical and drug name
recognition task in patents  (
http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/)


The CHEMDNER-patents task (BioCreative V - http://www.biocreative.org) is a
community challenge on named entity recognition of chemical compounds in
patents and text classification.


*Task Organizers*

   - Martin Krallinger, Spanish National Cancer Research Centre
   - Florian Leitner, Universidad Politecnica de Madrid
   - Obdulia Rabal, Center for Applied Medical Research (CIMA), University
   of Navarra
   - Julen Oyarzabal, Center for Applied Medical Research (CIMA),
   University of Navarra
   - Alfonso Valencia, Spanish National Cancer Research Centre


Registration and participation

Teams interested in the CHEMDNER-patents task should register for track 2
of BioCreative V:

http://www.biocreative.org/events/biocreative-v/biocreative-v-team/


Background

This task will address the automatic extraction of chemical and biological
data from medicinal chemistry patents. The identification and integration
of all information contained in these patents (e.g., chemical structures,
their synthesis and associated biological data) is currently a very hard
task not only for database curators but for life sciences researches and
biomedical text mining experts as well. Despite the valuable
characterizations of biomedical relevant entities such as chemical
compounds, genes and proteins contained in patents, academic research in
the area of text mining and information extraction using patent data has
been minimal. Pharmaceutical patents covering chemical compounds provide
information on their therapeutic applications and, in most cases, on their
primary biological targets.


*CHEMDNER-patents tasks*

This task would cover three essential steps for the identification of
biomedical relevant descriptions of chemical compounds:

=C2=B7  *CEMP* (chemical entity mention in patents, main task): the detecti=
on of
chemical named entity mentions in patents (start and end indices
corresponding to all the chemical entities).

=C2=B7  *CPD* (chemical passage detection, text classification task): the
detection of sentences that mention chemical compounds.

=C2=B7  *CER* (chemical entity relation): the extraction of chemical compou=
nd
relations; covering biologically relevant chemical relations (e.g.
chemical-biological targets relations).

Participating teams do not need to send results for all of three sub-tasks.
The can also send results only for individual sub-tasks.


CHEMDNER session at the BioCreative V workshop

At the BioCreative V Workshop to be held in Seville (Spain) September 9-11
(2015) there will be a session devoted to the CHEMDNER patents task. This
session will include an overview talk presenting the used datasets and
results obtained by the participating teams. A number of teams will also be
invited to present their systems. We plan to have also a discussion session
where teams, task organizers and domain experts will discuss the obtained
results and future steps. Finally during the poster session all teams will
be able to present their participating strategies.


CHEMDNER patents workshop proceedings and journal special issue

Participating teams will be invited to contribute to the: Proceedings of
the Fifth BioCreative Challenge Evaluation Workshop. A selected number of
top performing teams will also be invited to contribute with a system
description paper to a special issue of a relevant journal in the field.


Previous CHEMDNER (Biocreative IV)

The CHEMDNER-Biocreative IV special issue was published in the Journal of
Chemoinformatics: Volume 7 Supplement 1, 'Text mining for chemistry and the
CHEMDNER track'. It focused on the detection of chemical entities from
PubMed abstracts. The entire supplement is available from the *Journal of
Cheminformatics*: http://www.jcheminf.com/supplements/7/S1


The special issue includes an overview paper on the task, a paper on the
CHEMDNER corpus and 13 selected systems description papers. Top scoring
teams obtained an F-score of 87.39% for the recognition of chemical entity
mentions, a very competitive result already close to the human IAA.
Additionally some systems could show additional improvements compared to
their original submissions.


In addition participating teams provided a short systems description paper
for the BioCreative workshop proceedings, see:

http://www.biocreative.org/resources/publications/chemdner-proceed-publicat=
ions/


*References*

   1. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., &
   Valencia, A. CHEMDNER: The drugs and chemical names extraction challenge=
.
   Journal of Cheminformatics 2015, 7(Suppl 1):S1
   2. Krallinger, M. et al. The CHEMDNER corpus of chemicals and drugs and
   its annotation principles. Journal of Cheminformatics 2015, 7(Suppl 1):S=
2
   3. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., &
   Valencia, A. (2013, October). Overview of the chemical compound and drug
   name recognition (CHEMDNER) task. In BioCreative Challenge Evaluation
   Workshop (Vol. 2, p. 2).
   4. Akhondi, S. A., Klenner, A. G., Tyrchan, C., Manchala, A. K.,
   Boppana, K., Lowe, D., ... & Muresan, S. (2014). Annotated Chemical Pate=
nt
   Corpus: A Gold Standard for Text Mining. PloS one, 9(9), e107477.
   5. Grego, T., P=C4=99zik, P., Couto, F. M., & Rebholz-Schuhmann, D. (200=
9).
   Identification of chemical entities in patent documents. In Distributed
   Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and
   Ambient Assisted Living (pp. 942-949). Springer Berlin Heidelberg.
   6. Jessop, D. M., Adams, S. E., & Murray-Rust, P. (2011). Mining
   chemical information from Open patents. Journal of cheminformatics, 3(1)=
,
   40.
   7. Gurulingappa, H., M=C3=BCller, B., Klinger, R., Mevissen, H. T.,
   Hofmann-Apitius, M., Friedrich, C. M., & Fluck, J. (2010). Prior Art Sea=
rch
   in Chemistry Patents Based On Semantic Concepts and Co-Citation Analysis=
.
   In TREC.
   8. Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M.,
   Stothard, P., ... & Woolsey, J. (2006). DrugBank: a comprehensive resour=
ce
   for in silico drug discovery and exploration. Nucleic acids research,
   34(suppl 1), D668-D672.
   9. Zhu, F., Han, B., Kumar, P., Liu, X., Ma, X., Wei, X., ... & Chen, Y.
   (2010). Update of TTD: therapeutic target database. Nucleic acids resear=
ch,
   38(suppl 1), D787-D791.

--047d7b5d4e0606d1d7050f73c632
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">


<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Time=
s">CALL FOR PARTICIPATION: CHEMDNER-patents task:
Chemical and drug name recognition task in patents<span style>=C2=A0 </span=
>(<a href=3D"http://www.biocreative.org/tasks/biocreative-v/track-2-chemdne=
r/"><span style=3D"font-family:Cambria" lang=3D"ES-TRAD">http://www.biocrea=
tive.org/tasks/biocreative-v/track-2-chemdner/</span></a>)<span style>=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 </span></span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Time=
s">=C2=A0</span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Time=
s"><br></span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Time=
s">The CHEMDNER-patents task (BioCreative V - <a href=3D"http://www.biocrea=
tive.org">http://www.biocreative.org</a>)
is a community challenge on named entity recognition of chemical compounds =
in
patents and text classification.</span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Time=
s">=C2=A0</span></p><p class=3D"MsoNormal" style><span style=3D"font-size:1=
0pt;font-family:Times">=C2=A0</span></p>

<p class=3D"MsoNormal" style=3D"margin:0.1pt 0cm"><b style><span style=3D"f=
ont-family:Times">Task Organizers</span></b></p>

<ul style=3D"margin-top:0cm" type=3D"disc"><li class=3D"MsoNormal" style=3D=
"margin-top:0.1pt;margin-bottom:0.1pt"><span style=3D"font-size:10pt;font-f=
amily:Times">Martin
     Krallinger, Spanish National Cancer Research Centre </span></li><li cl=
ass=3D"MsoNormal" style=3D"margin-top:0.1pt;margin-bottom:0.1pt"><span styl=
e=3D"font-size:10pt;font-family:Times">Florian
     Leitner, Universidad Politecnica de Madrid </span></li><li class=3D"Ms=
oNormal" style=3D"margin-top:0.1pt;margin-bottom:0.1pt"><span style=3D"font=
-size:10pt;font-family:Times">Obdulia
     Rabal, Center for Applied Medical Research (CIMA), University of Navar=
ra </span></li><li class=3D"MsoNormal" style=3D"margin-top:0.1pt;margin-bot=
tom:0.1pt"><span style=3D"font-size:10pt;font-family:Times">Julen
     Oyarzabal, Center for Applied Medical Research (CIMA), University of
     Navarra </span></li><li class=3D"MsoNormal" style=3D"margin-top:0.1pt;=
margin-bottom:0.1pt"><span style=3D"font-size:10pt;font-family:Times">Alfon=
so
     Valencia, Spanish National Cancer Research Centre </span></li></ul>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Cour=
ier">=C2=A0</span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Cour=
ier">=C2=A0</span></p>

<h4 style=3D"margin:0.1pt 0cm">Registration and participation</h4>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Time=
s">Teams interested in the CHEMDNER-patents task should
register for track 2 of BioCreative V:</span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Time=
s"><a href=3D"http://www.biocreative.org/events/biocreative-v/biocreative-v=
-team/"><span style=3D"font-family:Cambria" lang=3D"ES-TRAD">http://www.bio=
creative.org/events/biocreative-v/biocreative-v-team/</span></a></span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Cour=
ier">=C2=A0</span></p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Cour=
ier">=C2=A0</span></p>

<h4 style=3D"margin:0.1pt 0cm">Background</h4>

<p style=3D"margin:0.1pt 0cm">This
task will address the automatic extraction of chemical and biological data =
from
medicinal chemistry patents. The identification and integration of all
information contained in these patents (e.g., chemical structures, their
synthesis and associated biological data) is currently a very hard task not
only for database curators but for life sciences researches and biomedical =
text
mining experts as well. Despite the valuable characterizations of biomedica=
l
relevant entities such as chemical compounds, genes and proteins contained =
in
patents, academic research in the area of text mining and information
extraction using patent data has been minimal. Pharmaceutical patents cover=
ing
chemical compounds provide information on their therapeutic applications an=
d,
in most cases, on their primary biological targets. </p>

<p class=3D"MsoNormal" style><span style=3D"font-size:10pt;font-family:Cour=
ier">=C2=A0</span></p>

<p class=3D"MsoNormal" style=3D"margin:0.1pt 0cm"><b style><span style=3D"f=
ont-family:Times">CHEMDNER-patents tasks</span></b></p>

<p class=3D"MsoNormal" style=3D"margin:0.1pt 0cm"><span style=3D"font-size:=
10pt;font-family:Times">This task would
cover three essential steps for the identification of biomedical relevant
descriptions of chemical compounds: </span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:10pt;font-family:Symbol">=
=C2=B7</span><span style=3D"font-size:10pt;font-family:Times"><span style>=
=C2=A0 </span><i style>CEMP</i> (chemical
entity mention in patents, main task): the detection of chemical named enti=
ty
mentions in patents (start and end indices corresponding to all the chemica=
l
entities). </span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:10pt;font-family:Symbol">=
=C2=B7</span><span style=3D"font-size:10pt;font-family:Times"><span style>=
=C2=A0 </span><i style>CPD</i> (chemical
passage detection, text classification task): the detection of sentences th=
at
mention chemical compounds. </span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:10pt;font-family:Symbol">=
=C2=B7</span><span style=3D"font-size:10pt;font-family:Times"><span style>=
=C2=A0 </span><i style>CER</i> (chemical
entity relation): the extraction of chemical compound relations; covering
biologically relevant chemical relations (e.g. chemical-biological targets
relations). <br>
<br>
Participating teams do not need to send results for all of three sub-tasks.=
 The
can also send results only for individual sub-tasks.</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:10pt;font-family:Times">=C2=
=A0</span></p>

<h4 style=3D"margin:0.1pt 0cm">CHEMDNER session at the BioCreative V worksh=
op</h4>

<p style=3D"margin:0.1pt 0cm">At
the BioCreative V Workshop to be held in Seville (Spain) September 9-11 (20=
15)
there will be a session devoted to the CHEMDNER patents task. This session =
will
include an overview talk presenting the used datasets and results obtained =
by
the participating teams. A number of teams will also be invited to present
their systems. We plan to have also a discussion session where teams, task
organizers and domain experts will discuss the obtained results and future
steps. Finally during the poster session all teams will be able to present
their participating strategies.</p>

<p style=3D"margin:0.1pt 0cm">=C2=A0</p>

<h4 style=3D"margin:0.1pt 0cm">CHEMDNER patents workshop proceedings and jo=
urnal special issue</h4>

<p style=3D"margin:0.1pt 0cm">Participating
teams will be invited to contribute to the: Proceedings of the Fifth
BioCreative Challenge Evaluation Workshop. A selected number of top perform=
ing
teams will also be invited to contribute with a system description paper to=
 a
special issue of a relevant journal in the field. </p>

<p class=3D"MsoNormal"><span lang=3D"ES-TRAD">=C2=A0</span></p>

<p class=3D"MsoNormal"><span lang=3D"ES-TRAD">=C2=A0</span></p>

<h4 style=3D"margin:0.1pt 0cm">Previous CHEMDNER (Biocreative IV)</h4>

<p style=3D"margin:0.1pt 0cm">The
CHEMDNER-Biocreative IV special issue was published in the Journal of
Chemoinformatics: Volume 7 Supplement 1, &#39;Text mining for chemistry and=
 the
CHEMDNER track&#39;. It focused on the detection of chemical entities from =
PubMed
abstracts. The entire supplement is available from the <em><span style=3D"f=
ont-family:Times">Journal of Cheminformatics</span></em>: <a href=3D"http:/=
/www.jcheminf.com/supplements/7/S1">http://www.jcheminf.com/supplements/7/S=
1</a></p>

<p style=3D"margin:0.1pt 0cm">=C2=A0</p>

<p style=3D"margin:0.1pt 0cm">The
special issue includes an overview paper on the task, a paper on the CHEMDN=
ER
corpus and 13 selected systems description papers. Top scoring teams obtain=
ed
an F-score of 87.39% for the recognition of chemical entity mentions, a ver=
y
competitive result already close to the human IAA. Additionally some system=
s could
show additional improvements compared to their original submissions. </p>

<p style=3D"margin:0.1pt 0cm">=C2=A0</p>

<p style=3D"margin:0.1pt 0cm">In
addition participating teams provided a short systems description paper for=
 the
BioCreative workshop proceedings, see: </p>

<p style=3D"margin:0.1pt 0cm"><a href=3D"http://www.biocreative.org/resourc=
es/publications/chemdner-proceed-publications/">http://www.biocreative.org/=
resources/publications/chemdner-proceed-publications/</a></p>

<p class=3D"MsoNormal"><span lang=3D"ES-TRAD">=C2=A0</span></p>

<p class=3D"MsoNormal" style=3D"margin:0.1pt 0cm"><b style><span style=3D"f=
ont-family:Times">References</span></b></p>

<ol style=3D"margin-top:0cm" start=3D"1" type=3D"1"><li class=3D"MsoNormal"=
 style=3D"margin-top:0.1pt;margin-bottom:0.1pt"><span style=3D"font-size:10=
pt;font-family:Times">Krallinger,
     M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., &amp; Valencia=
, A.
     CHEMDNER: The drugs and chemical names extraction challenge. Journal o=
f
     Cheminformatics 2015, 7(Suppl 1):S1 </span></li><li class=3D"MsoNormal=
" style=3D"margin-top:0.1pt;margin-bottom:0.1pt"><span style=3D"font-size:1=
0pt;font-family:Times">Krallinger,
     M. et al. The CHEMDNER corpus of chemicals and drugs and its annotatio=
n
     principles. Journal of Cheminformatics 2015, 7(Suppl 1):S2 </span></li=
><li class=3D"MsoNormal" style=3D"margin-top:0.1pt;margin-bottom:0.1pt"><sp=
an style=3D"font-size:10pt;font-family:Times">Krallinger,
     M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., &amp; Valencia=
, A.
     (2013, October). Overview of the chemical compound and drug name
     recognition (CHEMDNER) task. In BioCreative Challenge Evaluation Works=
hop
     (Vol. 2, p. 2). </span></li><li class=3D"MsoNormal" style=3D"margin-to=
p:0.1pt;margin-bottom:0.1pt"><span style=3D"font-size:10pt;font-family:Time=
s">Akhondi,
     S. A., Klenner, A. G., Tyrchan, C., Manchala, A. K., Boppana, K., Lowe=
,
     D., ... &amp; Muresan, S. (2014). Annotated Chemical Patent Corpus: A =
Gold
     Standard for Text Mining. PloS one, 9(9), e107477. </span></li><li cla=
ss=3D"MsoNormal" style=3D"margin-top:0.1pt;margin-bottom:0.1pt"><span style=
=3D"font-size:10pt;font-family:Times">Grego,
     T., P=C4=99zik, P., Couto, F. M., &amp; Rebholz-Schuhmann, D. (2009).
     Identification of chemical entities in patent documents. In Distribute=
d
     Computing, Artificial Intelligence, Bioinformatics, Soft Computing, an=
d
     Ambient Assisted Living (pp. 942-949). Springer Berlin Heidelberg. </s=
pan></li><li class=3D"MsoNormal" style=3D"margin-top:0.1pt;margin-bottom:0.=
1pt"><span style=3D"font-size:10pt;font-family:Times">Jessop,
     D. M., Adams, S. E., &amp; Murray-Rust, P. (2011). Mining chemical
     information from Open patents. Journal of cheminformatics, 3(1), 40. <=
/span></li><li class=3D"MsoNormal" style=3D"margin-top:0.1pt;margin-bottom:=
0.1pt"><span style=3D"font-size:10pt;font-family:Times">Gurulingappa,
     H., M=C3=BCller, B., Klinger, R., Mevissen, H. T., Hofmann-Apitius, M.=
,
     Friedrich, C. M., &amp; Fluck, J. (2010). Prior Art Search in Chemistr=
y
     Patents Based On Semantic Concepts and Co-Citation Analysis. In TREC. =
</span></li><li class=3D"MsoNormal" style=3D"margin-top:0.1pt;margin-bottom=
:0.1pt"><span style=3D"font-size:10pt;font-family:Times">Wishart,
     D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard,=
 P.,
     ... &amp; Woolsey, J. (2006). DrugBank: a comprehensive resource for i=
n
     silico drug discovery and exploration. Nucleic acids research, 34(supp=
l
     1), D668-D672. </span></li><li class=3D"MsoNormal" style=3D"margin-top=
:0.1pt;margin-bottom:0.1pt"><span style=3D"font-size:10pt;font-family:Times=
">Zhu,
     F., Han, B., Kumar, P., Liu, X., Ma, X., Wei, X., ... &amp; Chen, Y.
     (2010). Update of TTD: therapeutic target database. Nucleic acids rese=
arch,
     38(suppl 1), D787-D791. </span></li></ol>

<p class=3D"MsoNormal"><span lang=3D"ES-TRAD">=C2=A0</span></p>


</div>

--047d7b5d4e0606d1d7050f73c632--