RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BC Workshop '12

Call For Participation [2011-09-08]

BioCreative-2012 Workshop

Interactive Text Mining in the Biocuration Workflow

April 4-5, 2012, Washington DC, USA

BioCreative: Critical Assessment of Information Extraction in Biology is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. Built on the success of the previous BioCreative Challenge Evaluations and Workshops (BioCreative I, II, II.5, and III) [1-4] and in preparation for the BioCreative IV Challenge Evaluations, the BioCreative Organizing Committee will host the BioCreative-2012 Workshop in Washington DC on April 4-5, 2012, in conjunction with the Fifth International Biocuration Conference to be held April 2-4, 2012.

The BioCreative-2012 Workshop on Interactive Text Mining in the Biocuration Workflow aims to bring together the biocuration and text mining communities towards the development and evaluation of interactive text mining tools and systems to improve utility and usability in the biocuration workflow. To achieve this goal, the workshop will consist of three Tracks: (I-Triage) a collaborative biocuration-text mining development task for document prioritization for curation; (II-Workflow) a biocuration workflow survey and analysis task; and (III-Interactive TM) an interactive text mining and user evaluation task. Note that this workshop differs from the free-standing BioCreative workshops (BioCreative I-III held in 2004-2010 and BioCreative IV to be held in 2013) in that there will be a formal evaluation of systems, but no competition.

The BioCreative-2012 Workshop Proceedings will be published for participating text mining systems and biocuration workflows presented at the workshop. A special VIRTUAL ISSUE of DATABASE: The Journal of Biological Database and Curation will be published for overview papers of the Tracks and possibly also for selected systems demonstrating significant contributions to biocuration workflows. Text mining and biocuration teams who are interested in participating in any of the three Tracks should register by November 15, 2011 here. Note that at this stage registration does not imply commitment; final commitment will be requested in subsequent stages.

The BioCreative workshop schedule will be synchronized with the Biocuration meeting to allow attendance to the plenary sessions and other text mining relevant sessions.

  • Background
  • Track I-Triage
  • Track II-Workflow
  • Track III-Interactive TM
  • Important Dates
  • Organizing Committee
  • Note
  • References
  • Background

    Since its inception, BioCreative has benefited from close collaborations between the community of text mining developers and well-curated biological databases including GOA-EBI, IntAct, MINT and BioGRID. To address the utility and usability of text mining tools for biocuration, an Interactive Task was introduced as a new feature in BioCreative III. It was a demonstration task focusing on gene-based document retrieval.

    In past BioCreative efforts we have been studying various types of information as they appear in free text and how these terms may be recognized by a computer. While progress has been made in recognition of named entities such as genes, proteins, and protein-protein interactions (PPI), as well as in the recognition of textual evidence for such things as GO codes and PPI interaction types and methods of confirmation, it is evident that the computer algorithms fall short of perfection. We are looking for ways to employ computers to assist a human curator rather than as a replacement. Results from previous BioCreative meetings showed that to assist a human curator with a computer it is necessary to understand as much as possible and as clearly as possible what a human curator does [5]. Since there are many databases which contain different types of biological information that have been found useful in the scientific enterprise, there are actually multiple opportunities to study and understand how human experts process natural language text and extract information and enter it into a database.

    Having this workshop as a satellite to the International Biocuration Conference is a great opportunity to bring together biocuration and text mining communities in working towards tailoring text mining systems for a given curation workflow. This effort will set the basis for the next BioCreative IV Challenge, culminating in the BioCreative IV Workshop to be held in Washington DC in 2013.

    Back to top

    Track I: Triage (Collaborative Biocuration-Text Mining Development Task for Document Prioritization for Curation)

    In collaboration with the Comparative Toxicogenomic Database (CTD), we are calling for text mining teams to participate in developing tools/systems that are tailored to CTD’s literature-based curation workflow and more specifically the triage stage (prioritization of articles to be curated). A detailed CTD Curation Overview with prioritization scheme, along with an annotated literature corpus, is provided by the CTD team and can be found here.

    The CTD curators will work closely with the participating text mining teams towards their system development and will provide data sets for training and test. The resulting systems will be evaluated for precision and recall based on the benchmarking set prior to the workshop. The participating systems will be further evaluated prior to the workshop for utility and usability through a web interface by CTD curators. During the workshop, participating systems will be presented in a demo session, where biocuration participants will have the opportunity to test the systems. The benchmarking results and CTD curator evaluation will also be presented. Selected systems will be included in a planned manuscript (Track I overview paper) summarizing the evaluation results following the workshop for publication in DATABASE. The CTD triage task is planned as one of the tasks for the BioCreative IV Challenge.

    Register to participate in Track I by November 15, 2011 (at this point no official commitment). Final commitment is required when test data is released (February 6, 2012) by emailing to (subject: BioCreative-2012 Track I). Benchmarking results and description of the system should be returned by February 20, 2012. Developers should provide access to their system for web-based testing on March 1, 2012.

    Back to top

    Track II: Workflow (Biocuration Workflow Survey and Analysis Task)

    We are calling for curation teams to produce a short document (up to 6 pages, including figures), describing their curation process and workflow as it starts from criteria for selection of articles for curation (as journal articles or abstracts) culminating in database entries. As a help in this important enterprise, we have put together an outline of the kinds of things we think would be useful to text mining developers who are seeking to produce algorithms and tools to assist the curation process. More details on the structure of the Track and an example can be found here.

    Participating curation teams will be invited for oral or poster presentation of their curation workflows at the workshop. Selected workflows will be included in a planned manuscript (Track II overview paper) summarizing the biocuration workflow survey and analysis following the workshop for publication in the DATABASE. Selected biocuration teams may also be invited to participate in the BioCreative IV Challenge to provide workflows and annotated literature corpora as targets for text mining tool development and evaluation.

    Register a biocuration team by November 15, 2011 (at this point no official commitment). Submission of the workflow document is considered as final commitment. Email document to (subject: BioCreative-2012 Track II) by December 31, 2011.

    Back to top

    Track III: Interactive TM (Interactive Text Mining and User Evaluation Task)

    We are calling for text mining teams to participate in interactive system demonstration and biocurators to participate in user evaluation.

    For Text Mining Teams

    We invite text mining teams to submit system descriptions that highlight a text mining/NLP system with respect to a specific biocuration task. All submissions will be evaluated by the BioCreative Organizing Committee according to the following criteria:

    1. Relevance and Impact: Is the system currently being used in a biocuration task/workflow?
    2. Adaptability: Is it robust and adaptable to applications for other related biocuration tasks (i.e., can be utilized by multiple databases/resources)?
    3. Interactivity: Does it provide an interactive web interface for biocurator’s testing?
    4. Performance: Can the system be benchmarked and provide precision and recall for the task?

    For system evaluation, the participating teams should: (i) provide a set of annotated examples as a practice test for curators prior to the evaluation, and (ii) suggest biocurator(s) who could annotate literature corpus as the gold standard for evaluation before the workshop.

    During the workshop, the selected text mining systems will be presented in a demo session, where biocuration participants will have the opportunity to test the systems. While this is a demonstration task—not a competition, the user study and data collected will provide a basis for developing evaluation metrics for the Interactive Task in the BioCreative IV Challenge.

    More details on the structure of the Track and an example are provided here.

    Register a text mining team to participate in Track III by November 15, 2011 (no official commitment). Submission of the system description is considered the final commitment. Email system description to (subject: BioCreative-2012 Track III) by December 31, 2011.

    For Biocurators

    We invite biocurators to participate in a user study on the text mining system of their choice prior to and at the workshop. The user study will involve (i) manual curation and text-mining of the literature corpus by the biocurators; (ii) recording of the user interactions with the system (with logs of all queries and clicks); and (iii) a post-study survey. The evaluation will include comparing time-on-task in manual vs. system activities, as well as precision and recall comparing to the gold standard. More details about this track can be found here.

    Biocurators registration to participate in this activity is now open and we encourage to register now, but provide final commitment once systems are posted for selection. To register send an email to The gold standard will be made available by the Organizing Committee upon user registration on or about March 1. The biocurator evaluation results should be submitted to by March 25, 2012.

    The list of systems available for evaluation can be found here.

    For all

    The results and observations from the overall evaluation will be presented by the Track coordinators. The user study of the demo systems will be included in a planned manuscript (Track III overview paper) for publication in DATABASE.

    Back to top

    Important Dates

    Track Item Deadline Submit via Comment
    I, II, III Team Registration November 15, 2011 web register here
    I Text Mining System Description February 20, 2012 email to Subject:BioCreative-2012 Track I
    Submission of Benchmarking Results February 20, 2012 email to Subject:BioCreative-2012 Track I-results
    Interface Available for Testing March 1, 2012 email to Subject:BioCreative-2012 Track I-testing
    II Submission of Biocuration Workflow December 31, 2011 email to Subject:BioCreative-2012 Track II
    III Text Mining System Description December 31, 2011 email to Subject:BioCreative-2012 Track III
    System Benchmarking Results March 1, 2012 email to Subject:BioCreative-2012 Track III-result
    Submission of Practice Test March 1, 2012 email to Subject:BioCreative-2012 Track III-test
    Interface Available for Testing March 1, 2012 email to Subject:BioCreative-2012 Track III-interface
    Biocurator's Evaluation March 20-25, 2012 email to Subject:BioCreative-2012 Track III-biocurator
    I, II, III Workshop April 4 (noon)-April 5 (5pm), 2012 Reporting from testing Track I and III will be presented

    Back to top

    Workshop Registration

    Workshop Registration will start on February 13, 2012

    Organizing Committee

    Cecilia Arighi, University of Delaware, USA
    Kevin Cohen, University of Colorado, USA
    Lynette Hirschman, MITRE Corporation, USA
    Martin Krallinger, Spanish National Cancer Centre, CNIO, Spain
    Zhiyong Lu, National Center for Biotechnology Information (NCBI), NIH, USA
    Carolyn Mattingly, Mount Desert Island Biological Laboratory (MDIBL), USA
    Alfonso Valencia, Spanish National Cancer Centre, CNIO, Spain
    Thomas Wiegers, Mount Desert Island Biological Laboratory (MDIBL), USA
    John Wilbur, National Center for Biotechnology Information (NCBI), NIH, USA
    Cathy Wu, University of Delaware and Georgetown University, USA

    Back to top


    Biocuration-2012 Call for Papers, oral presentation and posters (
    The Biocuration Conference provides a unique opportunity to present text mining to the broad biocuration community. We have worked with the conference organizers to have a session on “Literature collection, text mining and curation” in the morning of April 4, prior to the BioCreative workshop that starts at noon on April 4. You are invited to submit your work for publication, poster, oral presentation or workshop. For more details about specifications and deadlines click here.

    Back to top


    1. Hirschman, L., A. Yeh, C. Blaschke, and A. Valencia, Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics, 2005. 6 Suppl 1: p. S1. PMCID:PMC1869002
    2. Krallinger, M., A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia, Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biol, 2008. 9 Suppl 2: p. S1. PMCID:PMC2559980
    3. Leitner, F., S.A. Mardis, M. Krallinger, G. Cesareni, L.A. Hirschman, and A. Valencia, An Overview of BioCreative II.5. IEEE/ACM Trans Comput Biol Bioinform, 2010. 7(3): p. 385-99.
    4. Arighi, C.N., Z. Lu, M. Krallinger, K.B. Cohen, J. Wilbur, A. Valencia, L. Hirschman, and C.H. Wu, Overview of the BioCreative III Workshop BMC Bioinformatics, 2011. 12 Suppl. 8: p. S1.
    5. Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 2008, 9(Suppl 2):S4. PMCID:PMC2559988

    Back to top