RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative III

Announcement [2010-03-09]

The 3rd Critical Assessment for Information Extraction in Biology challenge, BioCreative III is a community-wide effort for evaluating text mining and information extraction systems applied to the biomedical domain. The BioCreative III workshop, to be held in September 2010, will bring together stakeholders from the biocuration community with researchers from text mining and natural language processing applied to the biomedical literature.

BioCreative III will have three tasks:

  1. Cross-species gene normalization [GN] using full text
  2. Extraction of protein-protein interactions from full text [PPI], including document selection, identification of interacting proteins and identification of interacting protein pairs, as in BioCreative II
  3. Interactive demonstration task for gene indexing and retrieval task using full text [IAT]

Background

BioCreative arose out the needs of working biologists, biological curators and bioinformaticians to access the wealth of information in the literature, and to link this information to biological databases, using standard ontologies and controlled vocabularies. BioCreative focuses on comparison of methods and community assessment of scientific progress. Previous BioCreative challenges have attracted considerable interest not only in the bio text mining community, but also in the bioinformatics and biological database domains, resulting in two special journal issues and useful data resources for the development of biomedical text mining systems [1][2]. BioCreative has been organized through collaborations between text mining groups, biological database curators and bioinformatics researchers. BioCreative III (2010) and BioCreative IV (2012) will be funded in part by the US National Science Foundation, with an explicit focus on developing (interactive) applications to meet the needs of end users, especially curators.

BioCreative III Structure and Timetable

BioCreative III will begin in January 2010 and will culminate in the BioCreative III workshop, September 13-15, 2010 in Bethesda, Maryland, USA. It will consist of three tracks:

  1. GN: The gene normalization task will produce a list of the EntrezGene/ UniProtKB identifiers for all the genes/proteins mentioned in a collection of full text articles, similar to BioCreative II task 2, but not restricted to human. This task may include dividing the genes found in a document into those considered most important and those of lesser importance; criteria for this distinction are still under discussion.
  2. PPI: Protein-Protein Interaction (PPI) task will be similar to that of BioCreative II; it will involve selection/prioritization of relevant papers and the identification of interacting proteins and pairs of proteins based on presence of experimental evidence for these interactions.
  3. IAT: The BioCreative III interactive task (IAT 2010) is a demonstration task, and will focus on indexing (identifying which genes are being studied in an article and linking these genes to standard database identifiers) and gene-oriented document retrieval (identifying full-text papers relevant to a selected gene). This approach will facilitate the definition of metrics and acquisition of data that are necessary for designing the evaluation of the interactive systems in the BioCreative IV challenge. For this initial trial of an interactive task in biomedical text mining, text mining groups will have the opportunity to demonstrate their own interfaces and have biocurators try them out. Minimum requirements will be distributed along with a formal definition of the task, which will be focused on gene and protein normalization, since this is is an annotation task that cuts across many communities. Text mining and curation groups that are already collaborating are welcome to participate together. NCBI will also host a web page, which will allow interested parties to identify potential collaborators. If you are interested in collaborating, please send a short description of your role in a collaboration (e.g., curator, text miner, system developer), what you could contribute in a collaboration, and any URLs linking to further information or resources you may want to provide, to wilbur@ncbi.nlm.nih.gov.
A major goal of BioCreative III/IV is to put useful tools into the hands of end users. To encourage exploration of user interface and visualization capabilities, BioCreative III will include an opportunity to showcase other interactive systems of relevance to the molecular biology community (beyond gene/protein normalization). The BioCreative organizers will solicit candidates for this session; selection criteria will be that the system a) is up and running and accessible for use via the internet; b) has been applied to a real task; and c) is judged of interest to the end user community by the BioCreative III User Advisory Committee. To register for the BioCreative mailing list, please visit http://biocreative.sourceforge.net/mailing.html to add yourself to the BioCreative mailing list. For information on BioCreative III, see http://www.biocreative.org/news/chapter/biocreative-iii

References

  1. Hirschman et al., Overview of BioCreAtIvE: critical assessment of information extraction for biology., BMC Bioinformatics (2005) vol. 6 Suppl 1 (1471-2105 (Electronic)) pp. S1
  2. Krallinger et al., Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge., Genome biology (2008) vol. 9 Suppl 2 pp. S1