RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VI

Track 2: Text-mining services for Human Kinome Curation [2017-02-06]

What's new ?

Submission process detailed below (e-mail).

Submission deadline was extended to August, 1.

SUBMISSION PROCESS

The computed runs must be sent by e-mail to the following two e-mail addresses:

julien.gobeill@hesge.ch

julien.gobeill@hotmail.fr

as attachments. The submission e-mail should specify the names of the files (runs) included in the attachment, the sub-task ID (1/2/3), the axis (GO BP, DISEASE) as well as the team ID. Please insert a short description about methods and data used in your strategies.

OVERVIEW

Text mining teams are invited to develop and test approaches aiming at assisting database curators in the selection of relevant articles and passages for the curation of protein kinases. Literature triage is an Information Retrieval task; it aims at retrieving/filtering articles that are supposed to be relevant for curation. Beyond this, snippet selection is an Information Extraction task; it aims at extracting a short piece of text that contains enough information to make an annotation from a given article.

The Kinome Track dataset covers a significant fraction of the human Kinome (300 proteins out of the approximately 500 protein kinases), and is ready to be integrated in the neXtProt database by 2017. It contains comprehensive manual annotations about Gene Ontology biological processes and NCI diseases, each associated with a PMID. It is worth observing it is the first time that a database from the SIB Swiss Institute of Bioinformatics participates as a data provider in a text mining competition.

The Kinome Track is organized into three subtasks:

  1. abstracts triage
  2. fulltexts triage
  3. snippet selection

This table provides an overview of all subtasks.

SUBTASKS

1 - Abstracts triage

Abstracts collection : here (bioC format)
Training data : here
Test data : here
Short description : given a kinase and a curation axis, retrieve relevant citations for curation. Maximum of 10 runs per participant.
Collection : 5.3 Million MEDLINE citations in BioC format.
Input : couples made of: a kinase (e.g. "Activin receptor type-1B" - P36896), and an curation axis (biological processes, or diseases).
Output : a ranked list of PMIDs relevant for curation.
Tuning set : a sample of 100 kinases (subset 1), provided with a comprehensive list of relevant PMIDs for both axes.
Test set : a sample of 100 kinases (subset 2).
Evaluation : fully automatic. A citation will be judged as relevant if it was used in neXtProt, irrelevant otherwise.

2 - Fulltexts triage

Fulltexts collection : here (bioC format)
Training data : here
Test data : here
Short description : identical to subtask 1 but with fulltexts. Maximum of 10 runs per participant.
Collection : 1 Million PubMed Central fulltexts in BioC format.
Input : identical to subtask 1.
Output : a ranked list of PMCIDs relevant for curation.
Tuning set : a sample of 100 kinases (subset 1), provided with a comprehensive list of relevant PMCIDs for both axes.
Test set : a sample of 100 kinases (subset 3).
Evaluation : identical to subtask 1.

3 - Snippet selection

Snippets examples : here (Excel file) : examples of snippets of different qualities made by a SIB curator
Task 3 collection : here
Test data : here
Short description : given a kinase, a curation axis, and a fulltext regarded as relevant and used for annotation in neXtProt, select a snippet of maximum 500 characters that contains enough information to make an annotation. Maximum of 3 runs per participant.
Collection : N/A
Input : triples made of: a kinase (e.g. "Activin receptor type-1B" - P36896), a curation axis (biological processes, or diseases), and a fulltext regarded as relevant.
Output : a snippet of maximum 500 characters that contains enough information to make an annotation.
Tuning set : a small example set of expected snippets made by SIB curators.
Test set : a sample of 100 kinases (subset 1).
Evaluation : planned in August 2017 with curators. Submitted snippets will be evaluated by a curator. Curators will judge snippets and give them one of the three following values : 1 = Good (the snippet is sufficient for making an annotation without reading the paper) ; 0.5 = quite good (the curator thinks that there is a potential annotation, but needs to read the paper because the snippet is not sufficient for making the entire annotation) ; 0 = Irrelevant (nothing in the snippet indicates that an annotation is possible)

TIMELINE

Release of collection and tuning set : DONE
Release of test set : DONE
Runs submission deadline : August 1, 2017
Curators evaluation August, 2017
Delivery of evaluation results September 15, 2017

TRACK ORGANIZING COMMITTEE

Dr. Julien Gobeill (SIB, Switzerland)
Dr. Pascale Gaudet (SIB, Switzerland)
Prof. Patrick Ruch (SIB, Switzerland)

Downloads