Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 98D3D104E4 for ; Thu, 19 Feb 2015 17:02:03 +0000 (UTC) Received: (qmail 52228 invoked by uid 500); 19 Feb 2015 16:53:47 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 52085 invoked by uid 500); 19 Feb 2015 16:53:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 52075 invoked by uid 99); 19 Feb 2015 16:53:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Feb 2015 16:53:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of krallinger.martin@gmail.com designates 74.125.82.195 as permitted sender) Received: from [74.125.82.195] (HELO mail-we0-f195.google.com) (74.125.82.195) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Feb 2015 16:53:42 +0000 Received: by wesq59 with SMTP id q59so247790wes.0 for ; Thu, 19 Feb 2015 08:53:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=SU4zq/gM86mMQfGQtpA0u/475vLjt0kPvvUCZsQ5Zbo=; b=uIL4j9idTvH5amNVJS6V9kKorQR+0VqI3n4ihyg/qulSxx6/9rt6CmmbpEZxJZDyoq RwxlCuyuvvVmG7QZC/SW8UMP5zZysIgDaHF+Mf2Ab1gTlnEyDsx+KZLQiPJhZIk9OYVw cGxD4uBQvSRJu7dyRJr4Oxma469092XVJjVv+JFaUxGlvpfCpBWFKlYe/JWOdphR+Wc0 nPPPo6XmBPKMR8nqeTESSSYi8+LHY67yJH7qqKgX+3bHhXh1pnKpmrKKqeVrAF3kxwde wnbTDYvV/jRT6rCOY4TEBXiHYvjsQ6YSvGqgS0rSATz7E1WLLKQk9lS2pOnlp17gVwgW YAYA== MIME-Version: 1.0 X-Received: by 10.194.19.197 with SMTP id h5mr10420813wje.109.1424364801542; Thu, 19 Feb 2015 08:53:21 -0800 (PST) Received: by 10.194.41.202 with HTTP; Thu, 19 Feb 2015 08:53:21 -0800 (PST) Date: Thu, 19 Feb 2015 17:53:21 +0100 Message-ID: Subject: CALL FOR PARTICIPATION: CHEMDNER-patents task (Biocreative V) From: Martin Krallinger To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b5d4e0606d1d7050f73c632 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5d4e0606d1d7050f73c632 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable CALL FOR PARTICIPATION: CHEMDNER-patents task: Chemical and drug name recognition task in patents ( http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/) The CHEMDNER-patents task (BioCreative V - http://www.biocreative.org) is a community challenge on named entity recognition of chemical compounds in patents and text classification. *Task Organizers* - Martin Krallinger, Spanish National Cancer Research Centre - Florian Leitner, Universidad Politecnica de Madrid - Obdulia Rabal, Center for Applied Medical Research (CIMA), University of Navarra - Julen Oyarzabal, Center for Applied Medical Research (CIMA), University of Navarra - Alfonso Valencia, Spanish National Cancer Research Centre Registration and participation Teams interested in the CHEMDNER-patents task should register for track 2 of BioCreative V: http://www.biocreative.org/events/biocreative-v/biocreative-v-team/ Background This task will address the automatic extraction of chemical and biological data from medicinal chemistry patents. The identification and integration of all information contained in these patents (e.g., chemical structures, their synthesis and associated biological data) is currently a very hard task not only for database curators but for life sciences researches and biomedical text mining experts as well. Despite the valuable characterizations of biomedical relevant entities such as chemical compounds, genes and proteins contained in patents, academic research in the area of text mining and information extraction using patent data has been minimal. Pharmaceutical patents covering chemical compounds provide information on their therapeutic applications and, in most cases, on their primary biological targets. *CHEMDNER-patents tasks* This task would cover three essential steps for the identification of biomedical relevant descriptions of chemical compounds: =C2=B7 *CEMP* (chemical entity mention in patents, main task): the detecti= on of chemical named entity mentions in patents (start and end indices corresponding to all the chemical entities). =C2=B7 *CPD* (chemical passage detection, text classification task): the detection of sentences that mention chemical compounds. =C2=B7 *CER* (chemical entity relation): the extraction of chemical compou= nd relations; covering biologically relevant chemical relations (e.g. chemical-biological targets relations). Participating teams do not need to send results for all of three sub-tasks. The can also send results only for individual sub-tasks. CHEMDNER session at the BioCreative V workshop At the BioCreative V Workshop to be held in Seville (Spain) September 9-11 (2015) there will be a session devoted to the CHEMDNER patents task. This session will include an overview talk presenting the used datasets and results obtained by the participating teams. A number of teams will also be invited to present their systems. We plan to have also a discussion session where teams, task organizers and domain experts will discuss the obtained results and future steps. Finally during the poster session all teams will be able to present their participating strategies. CHEMDNER patents workshop proceedings and journal special issue Participating teams will be invited to contribute to the: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop. A selected number of top performing teams will also be invited to contribute with a system description paper to a special issue of a relevant journal in the field. Previous CHEMDNER (Biocreative IV) The CHEMDNER-Biocreative IV special issue was published in the Journal of Chemoinformatics: Volume 7 Supplement 1, 'Text mining for chemistry and the CHEMDNER track'. It focused on the detection of chemical entities from PubMed abstracts. The entire supplement is available from the *Journal of Cheminformatics*: http://www.jcheminf.com/supplements/7/S1 The special issue includes an overview paper on the task, a paper on the CHEMDNER corpus and 13 selected systems description papers. Top scoring teams obtained an F-score of 87.39% for the recognition of chemical entity mentions, a very competitive result already close to the human IAA. Additionally some systems could show additional improvements compared to their original submissions. In addition participating teams provided a short systems description paper for the BioCreative workshop proceedings, see: http://www.biocreative.org/resources/publications/chemdner-proceed-publicat= ions/ *References* 1. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., & Valencia, A. CHEMDNER: The drugs and chemical names extraction challenge= . Journal of Cheminformatics 2015, 7(Suppl 1):S1 2. Krallinger, M. et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles. Journal of Cheminformatics 2015, 7(Suppl 1):S= 2 3. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., & Valencia, A. (2013, October). Overview of the chemical compound and drug name recognition (CHEMDNER) task. In BioCreative Challenge Evaluation Workshop (Vol. 2, p. 2). 4. Akhondi, S. A., Klenner, A. G., Tyrchan, C., Manchala, A. K., Boppana, K., Lowe, D., ... & Muresan, S. (2014). Annotated Chemical Pate= nt Corpus: A Gold Standard for Text Mining. PloS one, 9(9), e107477. 5. Grego, T., P=C4=99zik, P., Couto, F. M., & Rebholz-Schuhmann, D. (200= 9). Identification of chemical entities in patent documents. In Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living (pp. 942-949). Springer Berlin Heidelberg. 6. Jessop, D. M., Adams, S. E., & Murray-Rust, P. (2011). Mining chemical information from Open patents. Journal of cheminformatics, 3(1)= , 40. 7. Gurulingappa, H., M=C3=BCller, B., Klinger, R., Mevissen, H. T., Hofmann-Apitius, M., Friedrich, C. M., & Fluck, J. (2010). Prior Art Sea= rch in Chemistry Patents Based On Semantic Concepts and Co-Citation Analysis= . In TREC. 8. Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., ... & Woolsey, J. (2006). DrugBank: a comprehensive resour= ce for in silico drug discovery and exploration. Nucleic acids research, 34(suppl 1), D668-D672. 9. Zhu, F., Han, B., Kumar, P., Liu, X., Ma, X., Wei, X., ... & Chen, Y. (2010). Update of TTD: therapeutic target database. Nucleic acids resear= ch, 38(suppl 1), D787-D791. --047d7b5d4e0606d1d7050f73c632 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

CALL FOR PARTICIPATION: CHEMDNER-patents task: Chemical and drug name recognition task in patents=C2=A0 (http://www.biocrea= tive.org/tasks/biocreative-v/track-2-chemdner/)=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0

=C2=A0


The CHEMDNER-patents task (BioCreative V - http://www.biocreative.org) is a community challenge on named entity recognition of chemical compounds = in patents and text classification.

=C2=A0

=C2=A0

Task Organizers

  • Martin Krallinger, Spanish National Cancer Research Centre
  • Florian Leitner, Universidad Politecnica de Madrid
  • Obdulia Rabal, Center for Applied Medical Research (CIMA), University of Navar= ra
  • Julen Oyarzabal, Center for Applied Medical Research (CIMA), University of Navarra
  • Alfon= so Valencia, Spanish National Cancer Research Centre

=C2=A0

=C2=A0

Registration and participation

Teams interested in the CHEMDNER-patents task should register for track 2 of BioCreative V:

http://www.bio= creative.org/events/biocreative-v/biocreative-v-team/

=C2=A0

=C2=A0

Background

This task will address the automatic extraction of chemical and biological data = from medicinal chemistry patents. The identification and integration of all information contained in these patents (e.g., chemical structures, their synthesis and associated biological data) is currently a very hard task not only for database curators but for life sciences researches and biomedical = text mining experts as well. Despite the valuable characterizations of biomedica= l relevant entities such as chemical compounds, genes and proteins contained = in patents, academic research in the area of text mining and information extraction using patent data has been minimal. Pharmaceutical patents cover= ing chemical compounds provide information on their therapeutic applications an= d, in most cases, on their primary biological targets.

=C2=A0

CHEMDNER-patents tasks

This task would cover three essential steps for the identification of biomedical relevant descriptions of chemical compounds:

= =C2=B7= =C2=A0 CEMP (chemical entity mention in patents, main task): the detection of chemical named enti= ty mentions in patents (start and end indices corresponding to all the chemica= l entities).

= =C2=B7= =C2=A0 CPD (chemical passage detection, text classification task): the detection of sentences th= at mention chemical compounds.

= =C2=B7= =C2=A0 CER (chemical entity relation): the extraction of chemical compound relations; covering biologically relevant chemical relations (e.g. chemical-biological targets relations).

Participating teams do not need to send results for all of three sub-tasks.= The can also send results only for individual sub-tasks.

=C2= =A0

CHEMDNER session at the BioCreative V worksh= op

At the BioCreative V Workshop to be held in Seville (Spain) September 9-11 (20= 15) there will be a session devoted to the CHEMDNER patents task. This session = will include an overview talk presenting the used datasets and results obtained = by the participating teams. A number of teams will also be invited to present their systems. We plan to have also a discussion session where teams, task organizers and domain experts will discuss the obtained results and future steps. Finally during the poster session all teams will be able to present their participating strategies.

=C2=A0

CHEMDNER patents workshop proceedings and jo= urnal special issue

Participating teams will be invited to contribute to the: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop. A selected number of top perform= ing teams will also be invited to contribute with a system description paper to= a special issue of a relevant journal in the field.

=C2=A0

=C2=A0

Previous CHEMDNER (Biocreative IV)

The CHEMDNER-Biocreative IV special issue was published in the Journal of Chemoinformatics: Volume 7 Supplement 1, 'Text mining for chemistry and= the CHEMDNER track'. It focused on the detection of chemical entities from = PubMed abstracts. The entire supplement is available from the Journal of Cheminformatics: http://www.jcheminf.com/supplements/7/S= 1

=C2=A0

The special issue includes an overview paper on the task, a paper on the CHEMDN= ER corpus and 13 selected systems description papers. Top scoring teams obtain= ed an F-score of 87.39% for the recognition of chemical entity mentions, a ver= y competitive result already close to the human IAA. Additionally some system= s could show additional improvements compared to their original submissions.

=C2=A0

In addition participating teams provided a short systems description paper for= the BioCreative workshop proceedings, see:

http://www.biocreative.org/= resources/publications/chemdner-proceed-publications/

=C2=A0

References

  1. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., & Valencia= , A. CHEMDNER: The drugs and chemical names extraction challenge. Journal o= f Cheminformatics 2015, 7(Suppl 1):S1
  2. Krallinger, M. et al. The CHEMDNER corpus of chemicals and drugs and its annotatio= n principles. Journal of Cheminformatics 2015, 7(Suppl 1):S2
  3. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., & Valencia= , A. (2013, October). Overview of the chemical compound and drug name recognition (CHEMDNER) task. In BioCreative Challenge Evaluation Works= hop (Vol. 2, p. 2).
  4. Akhondi, S. A., Klenner, A. G., Tyrchan, C., Manchala, A. K., Boppana, K., Lowe= , D., ... & Muresan, S. (2014). Annotated Chemical Patent Corpus: A = Gold Standard for Text Mining. PloS one, 9(9), e107477.
  5. Grego, T., P=C4=99zik, P., Couto, F. M., & Rebholz-Schuhmann, D. (2009). Identification of chemical entities in patent documents. In Distribute= d Computing, Artificial Intelligence, Bioinformatics, Soft Computing, an= d Ambient Assisted Living (pp. 942-949). Springer Berlin Heidelberg.
  6. Jessop, D. M., Adams, S. E., & Murray-Rust, P. (2011). Mining chemical information from Open patents. Journal of cheminformatics, 3(1), 40. <= /span>
  7. Gurulingappa, H., M=C3=BCller, B., Klinger, R., Mevissen, H. T., Hofmann-Apitius, M.= , Friedrich, C. M., & Fluck, J. (2010). Prior Art Search in Chemistr= y Patents Based On Semantic Concepts and Co-Citation Analysis. In TREC. =
  8. Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard,= P., ... & Woolsey, J. (2006). DrugBank: a comprehensive resource for i= n silico drug discovery and exploration. Nucleic acids research, 34(supp= l 1), D668-D672.
  9. Zhu, F., Han, B., Kumar, P., Liu, X., Ma, X., Wei, X., ... & Chen, Y. (2010). Update of TTD: therapeutic target database. Nucleic acids rese= arch, 38(suppl 1), D787-D791.

=C2=A0

--047d7b5d4e0606d1d7050f73c632--