BioCreative - Call for Participation

Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative IV

Call for Participation (Events) [2012-11-15]

BioCreative IV Challenge and Workshop

October 7-9, 2013 NCBI, Bethesda, Maryland, USA

BioCreative: Critical Assessment of Information Extraction in Biology is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. Built on the success of the previous BioCreative Challenge Evaluations and Workshops ( BioCreative I, II, II.5, III, and 2012 workshop) [1-5] the BioCreative Organizing Committee will host the BioCreative IV Challenge (http://www.biocreative.org/events/Biocreative IV/workshop/) at NCBI, National Institutes of Health, Bethesda, Maryland, on October 7-9, 2013. One key goal of BioCreative is the active involvement of the text mining user community in the design of the tracks, preparation of corpus and the testing of interactive systems. For BioCreative IV, the selection of the tracks has been driven in part by suggestions from the biocuration community during the BioCreative workshop 2012, and by our goal of addressing interoperability -- a major barrier to adoption to text mining tools.

BioCreative IV will consist of five tracks:

Track 1: Interoperability (BioC) – Development of an interoperable BioNLP module that can be seamlessly coupled to BioC compliant modules;

Track 2: Chemical and Drug Named Entity Recognition (CHEMDNER) – Detection of mentions of chemical compounds and drugs, in particular those chemical entity mentions that can subsequently be linked to a chemical structure;

Track 3: Comparative Toxicogenomics Database (CTD) Curation – Provision of Web Services to identify gene, chemical, disease, and action term mentions supporting CTD curation in PubMed abstracts;

Track 4: Gene Ontology (GO) curation – Development of automatic methods to aid GO curators in identifying articles with curatable GO information (triage) and extracting gene function terms and the associated evidence sentences in full-length articles;

Track 5: Interactive Curation (IAT) – Demonstration and evaluation of web-based systems addressing user-defined tasks, evaluated by curators on performance and usability;

TABLE OF CONTENT

REGISTRATION

Teams can participate in one or more of these tracks. Team registration will start on November 19 and will continue until final commitment is requested by the individual tracks.

To register a team go to http://www.biocreative.org/events/biocreative-iv/team/

IMPORTANT DATES

June 7, 2013

Date	Interoperability	CHEMDNER	CTD	GO	IAT
November 19 2012	Team registration starts
January, 2013	Guidelines release				Expression of interest email
May, 2013	Module description submission (January-June)		Training data realease	Training data realease
				System document submission
June, 2013		Sample data collection & eval. scripts			Acceptance communication
July, 2013		Training/development data release		Test	Pairing with biocurators, dataset preparation
August, 2013	Module submission to repository	Systems training phase	August, 2013		Curators training
September, 2013		Test set release & results	Results	Results	Evaluation
October 7-9, 2013	Workshop

BIOCREATIVE ORGANIZING COMMITTEE

Cecilia Arighi, University of Delaware, USA

Kevin Cohen, University of Colorado, USA

Lynette Hirschman, MITRE Corporation, USA

Martin Krallinger, Spanish National Cancer Centre, CNIO, Spain

Zhiyong Lu, National Center for Biotechnology Information (NCBI), NIH, USA

Carolyn Mattingly, North Carolina State University, USA

Catalina O. Tudor, University of Delaware, USA

Alfonso Valencia, Spanish National Cancer Centre, CNIO, Spain

Thomas Wiegers, North Carolina State University, USA

John Wilbur, National Center for Biotechnology Information (NCBI), NIH, USA

Cathy Wu, University of Delaware and Georgetown University, USA

USER ADVISORY GROUP

Chairs: Cecilia Arighi and Zhiyong Lu

Judith Blake, MGI, The Jackson Laboratory, USA

Andrew Chatr-aryamontri, BioGrid, Canada

Stan Laulederkind, Rat Genome Database, USA

Donghui Li, TAIR, USA

Sherri Matis, Astrazeneca, USA

Fiona McCarthy, Agbase, USA

Peter McQuilton, Flybase, UK

Sandra Orchard, IntAct, UK

Phoebe Roberts, Pfizer, USA

Mary Schaeffer, MaizeGDB, USA

Kimberly Van-Auken, Wormbase, USA

REFERENCES

Hirschman, L., A. Yeh, C. Blaschke, and A. Valencia, Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics, 2005. 6 Suppl 1: p. S1. PMCID:PMC1869002
Krallinger, M., A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia, Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biol, 2008. 9 Suppl 2: p. S1. PMCID:PMC2559980
Leitner, F., S.A. Mardis, M. Krallinger, G. Cesareni, L.A. Hirschman, and A. Valencia, An Overview of BioCreative II.5. IEEE/ACM Trans Comput Biol Bioinform, 2010. 7(3): p. 385-99.
Arighi, C.N., Z. Lu, M. Krallinger, K.B. Cohen, J. Wilbur, A. Valencia, L. Hirschman, and C.H. Wu, Overview of the BioCreative III Workshop BMC Bioinformatics, 2011. 12 Suppl. 8: p. S1.
Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 2008, 9(Suppl 2):S4. PMCID:PMC2559988
Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 2008, 9(Suppl 2):S4. PMCID:PMC2559988
Wu CH, Arighi CN, Cohen KB, Hirschman L, Krallinger Martin, Lu Z, Mattingly C, Valencia A, Wiegers TC, Wilbur WJ: Editorial: BioCreative-2012 Virtual Issue. Database (Oxford) 2012 (in press)

Track 1-BioC The BioCreative Interoperability Initiative

Goals for this task include promoting simplicity, interoperability, and broad use and reuse of text mining modules. For this purpose we propose BioC, an interchange format for tools for biomedical natural language processing. BioC is a family of simple XML formats, specified by DTD, to share text documents and annotations. The proposed annotation approach allows many different annotations to be represented, including sentences, tokens, parts of speech, and named entities such as genes or diseases. BioC packages in both C++ and Java can be downloaded from http://bioc.sourceforge.net, with code that includes basic classes to work with data in BioC format, as well as a couple of simple applications and examples. Participating teams will be asked to:

a) Preparing a BioC module that can be seamlessly coupled with the rest of the BioC code and definitions, and that performs an important NLP or BioNLP task. The task is left to participating teams to choose, implement and validate for the purposes of this challenge. If you are participating in any other BioCreative Track challenge and are producing a BioC compliant module, you are welcome to submit your module to Track 1. If the module you wish to produce is independent of the other tracks, then we request that you submit a proposal to the program committee for approval by end of July 2013. The program committee wishes to approve all proposed independent projects at this stage to avoid overlapping tasks. Such a proposal can consist of a couple of paragraphs and needs to be a high level description of the module you wish to develop and contribute to the repository.

b) Where necessary preparing a corpus or otherwise making data available, in the BioC format, which will allow the challenge committee to test and judge the performance of the module produced in part a).

c) Writing a paper that describes the BioC module produced in part a), the data provided in part b), evaluation of the module, and its proposed uses. The paper will be published as part of the BioCreative IV proceedings and a selected number of papers will also be considered for publication in a special journal issue. An accepted module along with the accompanying paper, and data where appropriate, is understood to be a contribution to the BioC public repository. The final products must be submitted to the repository by September 8, 2013 to give the organizers sufficient time to judge the acceptability of a product.

More information here

Track 2-Chemical compound and drug name recognition task (CHEMDNER)

The goal of this task is to promote the implementation of systems that are able to detect mentions of chemical compounds and drugs, in particular those chemical entity mentions that can subsequently be linked to a chemical structure. Participating teams will provide for the following predictions:

a) Given a set of documents, return a list of chemical entities described within each of these documents.

b) For a given document, provide the start and end indices corresponding to all the chemical entities mentioned in this document. For these two tasks the organizers will release training and test data collections. More information here.

Track 3-Chemical Toxicogenomic Database (CTD) Interoperability task

CTD (http://ctdbase.org) is a publicly available resource that seeks to promote understanding about the mechanisms by which drugs and environmental chemicals influence the function of biological processes and human health. CTD curators manually curate chemical-gene/protein interactions, chemical-disease relationships, and gene-disease relationships. Building upon the tasks proposed in Track 1 (interoperability) and Track 2 (chemical and drugs entity recognition), and driven by the direct biocuration needs at CTD, participating teams will be asked to provide Web Services that will enable CTD to send text passages to their remote sites in order to identify gene, chemical, disease, and action term mentions, each within the context of CTD's controlled vocabulary structure. Participating groups will be provided with training materials, a complete training corpus, and detailed testing results, including NER recall and processing response times.

More information here

Track 4-Gene Ontology Curation (GO task)

The goal of this task is to promote research and tool development for assisting gene ontology (GO) term curation from biomedical literature, an important and common task for many Model Organism Databases such as WormBase. Participating teams will first be asked to classify whether or not an article is relevant for GO curation (a document triage task). Next, for those curatable articles, they will be asked to predict GO annotations, together with one or more supporting evidence sentences (an information extraction task). The organizers will provide teams with gold-standard GO annotations for each full text article, including evidence sentences for each GO annotation. This data set will be annotated by members of the User Advisory Group (UAG) who are active members of a variety of Model Organism Databases.

More information here

Track 5-User Interactive task (IAT)

The goal of this task is to foster the interaction between system developers and biocurators in order to advance in the production and adoption of text mining tools useful for biocuration. Participating teams will present a web-based system that can address a biocuration task of their choice. This is a demonstration task, but the systems will be formally evaluated by biocurators based on (i) performance (time-on-task and accuracy of the text mining-assisted task as compared to manual or some reference curation), and (ii) a subjective measure of usability/utility of the system via a user questionnaire. Participating teams should engage two end users to assist in the developing phase and evaluation dataset selection/annotation;the organizers , together with the User Advisory Group will engage appropriate biocurators for the testing phase.

Participating teams should submit a document by May 31, 2013 where they describe the system and the proposed biocuration task(s), provide the URL to a functioning system, and address the following aspects:

a) Relevance and Impact: The teams should clearly describe: i) the targeted user community, and should be familiar and compliant with the needs and standards used by this community; ii) the task proposed and how it aligns with the guidelines set by the targeted user community, and iii) use cases for the application.

A document with the system's requirements with examples is now available in the download section at the end of the page linked here

c) System Performance: The system should have undergone some internal benchmarking to show that it performs reasonably for user testing. Teams should (i) provide the description of the dataset for such benchmarking, including source (who annotated it) and size; (ii) report on metrics: precision, recall and F-measure, and/or mean average precision (MAP), and (iii) indicate the level at which the metrics were calculated (sentence vs. document) which should correspond to the level to be tested in the user evaluation.

Teams who are interested in participating in this track should submit an email to Cecilia Arighi, arighi@dbi.udel.edu by January 15, 2013, Subject:BioCreative IV IAT. This will allow early planning and coordination of the task, however this notification is not a commitment. More information here.