BioCreative - Track 3- CDR

Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative V

Track 3- CDR [2014-12-18]

The 2015 CDR challenge is now successfully completed! Please find the overview paper below:

Wei CH, Peng Y, Leaman R, Davis AP, Mattingly CJ, Li J, Wiegers TC, and Lu Z. Overview of the BioCreative V Chemical Disease Relation (CDR) Task. Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 154-166

Thanks to all the participating teams! Please find system descriptions in the BioCreative V Workshop proceedings.

The task data is also freely available now for the research community.

Organizers

Zhiyong Lu, NCBI, NLM, NIH
Thomas Wiegers, North Carolina State University
Contact: zhiyong.lu@nih.gov; tcwieger@ncsu.edu

Registration

Teams interested in the CDR task should register for track 3 and join our mailinglist here so you can receive the latest task announcements.

Background

Chemicals, diseases, and their relations are among the most searched topics by PubMed users worldwide (1-3) as they play central roles in many areas of biomedical research and healthcare such as drug discovery and safety surveillance. Although the ultimate goal in drug discovery is to develop chemicals for therapeutics, recognition of adverse drug reactions between chemicals and diseases is important for improving chemical safety and toxicity studies and facilitating new screening assays for pharmaceutical compound survival. In addition, identification of chemicals as biomarkers can be helpful in informing potential relationships between chemicals and pathologies. Hence, manual annotation of such mechanistic and biomarker/correlative chemical-disease relations (CDR) from unstructured free text into structured knowledge has become an important theme for several bioinformatics databases such as the Comparative Toxicogenomics Database (CTD) (4). Here we consider the words ‘drug’ and ‘chemical’ to be interchangeable.

Manual curation of CDRs from the literature is costly and insufficient to keep up with the rapid literature growth. Despite these previous attempts (e.g. (5-7)), free text-based automatic biomedical relation detection, from identifying relevant concepts (e.g. diseases and chemicals (8-10)) to extracting relations, remains challenging. In addition, few relation extraction tools are freely available and to our best knowledge there is limited success of using such tools in real-world applications.

Tasks

Through BioCreative V, we propose a challenge task of automatic extraction of mechanistic and biomarker chemical-disease relations from the biomedical literature in support of biocuration, new drug discovery and drug safety surveillance. The task is aimed to advance text-mining research on relationship extraction and provide practical benefits to biocuration. Specific tasks are:

(A) Disease Named Entity Recognition and Normalization (DNER). An intermediate step for automatic CDR extraction is disease named entity recognition and normalization, which was found to be highly difficult on its own (8) in previous BioCreative CTD tasks (10,11). For the subtask, participating systems will be given PubMed abstract and asked to return normalized disease concept identifiers.

(B) Chemical-induced diseases relation extraction (CID). Participating systems will be provided with raw text of PubMed articles as input and asked to return a ranked list of pairs with normalized confidence scores for which drug-induced diseases are asserted in the abstract.

Note that teams can choose to participate in either subtask or both. To facilitate teams who are interested in task (B) only, disease and chemical entity taggers are provided in our supporting resources.

Task Data and Evaluation

For system development and evaluation, a new corpus will be provided for the challenge task to include both mention- and concept-level annotations. That is, we will mark up every chemical and disease occurrence in the abstract with both text spans and concept ids, and highlight associated CDR relations between relevant entities within or across sentences. For evaluation, standard precision, recall, and F-score will be used.

Timeline

December 2014: Task announcement

March 16, 2015: Release of sample set (50 articles) and annotation guideline Available Here

March 30, 2015: Release of training set (500 articles) Available Now

April 7, 2015: several supporting software tools are now available

May 22, 2015: Task evaluation kit is now available

May 29 2015: Release of development data (500 articles) Available Now

July 8 2015: Webinar slides & Task evaluation instructions released Available Now

~~July 29 2015~~ 8/12/15: Test your web services at http://tinyurl.com/bc5cdr-api

~~August 5 2015~~ 8/14/15: Register your web service URLs at http://tinyurl.com/bc5cdr-test

~~August 7 2015~~ 8/17/15: Team results submission via web services

8/31/2015: BioCreative V Workshop Paper (4-6 pages)

Workshop and Publications

Participating teams will be invited to publish their system description in Proceedings of the Fifth BioCreative Challenge Evaluation Workshop. A selected number of teams will also be invited to present their system at the BioCreative V Workshop and contribute a submission to a journal special issue.

Reference

1. Islamaj Dogan, R., Murray, G.C., Neveol, A., et al. (2009) Understanding PubMed user search behavior through log analysis. Database (Oxford), 2009, bap018.

2. Lu, Z. (2010) PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford), vol. 2011, baq036.

3. Neveol, A., Islamaj Dogan, R., Lu, Z. (2011) Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. J Biomed Inform, 44, 310-318.

4. Davis, A.P., Grondin, C.J., Lennon-Hopkins, K., et al. (2014) The Comparative Toxicogenomics Database's 10th year anniversary: update 2015. Nucleic Acids Res, 2014 Oct 17,gku935.

5. Xu, R., Wang, Q. (2014) Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature. J Biomed Inform.

6. Kang, N., Singh, B., Bui, C., et al. (2014) Knowledge-based extraction of adverse drug events from biomedical text. BMC Bioinformatics, 15, 64.

7. Gurulingappa, H., Mateen-Rajput, A., Toldo, L. (2012) Extraction of potential adverse drug events from medical case reports. Journal of biomedical semantics, 3, 15.

8. Leaman, R., Islamaj Dogan, R., Lu, Z. (2013) DNorm: disease name normalization with pairwise learning to rank. Bioinformatics, 29, 2909-2917.

9. Leaman, R., Wei, C.H., Lu, Z. (2015) tmChem: a high performance approach for chemical named entity recognition and normalization. Journal of Cheminformatics 2015, 7(Suppl 1):S3

10. Wiegers, T.C., Davis, A.P., Mattingly, C.J. (2014) Web services-based text-mining demonstrates broad impacts for interoperability and process simplification. Database (Oxford), 2014, bau050.

11. Wiegers, T.C., Davis, A.P., Mattingly, C.J. (2012) Collaborative biocuration--text-mining development task for document prioritization for curation. Database (Oxford), 2012, bas037.