RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative III

PPI: Protein-Protein Interactions [2009-12-08]

PPI Task Introduction

The aim of this task is to promote the development of automated systems that are able to extract biologically relevant information directly from the literature, in this case related to protein-protein interaction (PPI) annotation information. The resulting text mining tools should be able to improve access to this information for database curators, experimental biologists as well as bioinformaticians.

In order to reinforce the construction of practically usable applications, we will specially encourage the implementation of online systems that produce results in predefined prediction formats and with processing time constraints to facilitate direct comparison and integration of automatically generated results. The posed PPI tasks are directly inspired by the needs of biologists and database curators, following user demands that are based on the general steps underlying the PPI annotation workflow. These tasks cover (1) the selection of relevant articles from PubMed (Article Classification Task - ACT) and (2) linking of article to important experimental methods, i.e. interaction detection methods (Interaction Method Task - IMT).

The BioCreative III PPI corpora are available here .
ACT-BC-III (Article Classification Task – BioCreative III)

A common user need is to determine for a given collection of articles, e.g. defined as a list of PubMed records, derived from a keyword search (e.g. using a particular disease term) or a journal of interest which records are actually PPI relevant (contain descriptions indicating that this article is relevant for PPI article curation). This is as well the case when considering a list of articles related to a protein of interest for which all known interactions need to be extracted, or when some sort of periodic literature curation process is followed, and biologists want to determine which articles for a given time span (e.g. a month) are curation relevant.

To promote the development of such system, participating teams will be provided with a collection of recent PubMed records derived from a list of JOURNALS that had articles which were used by PPI annotation databases in the past (curation relevant journals)! Note: The initial setting of this task has been slightly modified to make resulting systems more practically relevant. Analyzing records from one month of PubMed abstracts with links to free full text articles (initial idea) resulted in a collection that only covered a minor fraction of PPI relevant journals. Less than 5 % of the records were PPI relevant in general, and even a smaller set was PPI annotation relevant, as most articles were actually related the clinical domain. Note that participating systems can also use information from the corresponding full text articles in case their pipeline is able to get access to them, but from the evaluation perspective, only information from the abstracts will be considered.


Evaluation will be based on comparison between automatically generated results and manual examination of a set of PubMed records. A similar set up will be followed as in case of BCII.5 for the evaluation of the systems. Additionally, we plan to evaluate how much time can be saved by using the automatic predictions as compared to unassisted manual classification. Therefore, the amount of time spent for manually labeling the abstracts will be recorded. One can then evaluate systems in terms of how long it would have taken to review the ranked list of articles submitted by participating systems in order to classify all the relevant records. The manual classification will be based on predefined curation guidelines that are refined through feedback of professional biocurators. To determine the difficulty and consistency of this task, an Inter-annotator agreement (IAA) study will be carried out.

ACT results have to be reported as four tab-separated columns:

  1. Article identifier.
  2. Classification result (0 for negative, 1 for positive hits).
  3. Unique rank of that classification result in the range [1..Nc], where Nc is the total number of hits for negative (c=0) or positive (c=1) results.
  4. Confidence for that classification in the range ]0..1], i.e., excluding zero-confidence.

Three data collection will be distributed to the participants.

a) ACT Training set:
A balanced collection of recent articles that are PPI relevant (based on manual inspection of abstracts + recent articles used for PPI annotation by databases) and PPI non-relevant articles (based on manual inspection). This data set consists in a total of 1140 relevant and 1140 non-relevant cases. Note: Relevance in this context relates to PPI interactions, and NOT genetic interactions or interactions between proteins and other bio-entities that are not proteins. This means that Protein-DNA, protein-RNA, protein-compound, protein-cellular structure, etc are considered as non relevant.

The PPI-ACT Training Data is available at: PPI-ACT Training Data

b) ACT Development set:
End of July a development set will be released of 4000 manually labeled abstracts sampled from the same pool as the test set collection. This data set will reflect the real class imbalance as observed in the test set (NOT balanced!).
The PPI-ACT Development Data is available at: PPI-ACT Development Data

c) ACT Test set:
We plan to release the test set August 9th 2010. The estimated size is of 6000 manually labeled abstracts sampled from the same pool as the development collection.
IMT-BCIII (Interaction Method Task – BioCreative III)

A crucial aspect for the correct annotation of experimentally determined protein interactions is to determine the technique described in the article to support a given interaction. Experimental techniques or qualifiers are also relevant for other annotations, such as Gene Ontology (evidence codes). This is also important to correctly associate the article to controlled vocabulary terms relevant for biology. This task will be similar in essence to the Interaction Method Subtask of BioCreative II.

In case of protein-protein interaction annotation, efforts have been made to develop a controlled vocabulary about interaction detection methods in order to standardize the terminology important to serve as experimental evidence support. A considerable amount of curation work is devoted to the manual extraction of the experimental evidence supporting protein interaction pairs described in articles. For this task, we will ask participants to provide, for each full text article, a ranked list of interaction detection methods, defined by their corresponding unique concept identifier from the PSI-MI ontology. To browse this ontology, please refer to:

To download the MI ontology please refer to the MI obo download file:

IMT results are to be returned in six tab-separated columns, consisting of:

  1. Article identifier
  2. Interaction Detection Method MI identifier
  3. Unique rank in the range [1..N], where N is the total number of hits for that article.
  4. Confidence for that concept in the range ]0..1], i.e., excluding zero-confidence.
  5. Evidence string (max 500 characters) derived from the full text paper

The provided training set will consist of over 1000 full text articles with their corresponding interaction detection method identifiers. The test set will consist in over 100 full text articles for which interaction detection need to be returned by participating systems. Evaluation criteria will be similar to the INT task of BioCreative II.5.

For example record prediction cases please refer to the lines below (one prediction in each tab separated line):

18653891 MI:0006 1 0.91 Co-immunoprecipitation of BRI1 with BSK3
18653891 MI:0424 2 0.82 In vitro kinase assays demonstrated that BRI1, but not BAK1, phosphorylates BSK1
18653891 MI:0809 3 0.74 cells coexpressing BRI1-nYFP and BSK1-cYFP showed strong BiFC fluorescence at the plasma membrane
11404324 MI:0018 1 0.96 In this analysis, we show that HIR1 interacts with ASF1 in a two-hybrid analysis
16091426 MI:0018 1 0.97 We screened a human brain cDNA library using the two-hybrid method to identify proteins that int eract with Sec23Ap

IMT additional notes

Extra Data: In addition to the training and development, an extra collection is provided as an additional resource, derived from the BioCreative II IMS task. This resource can be used to exploit the HTML format available for these articles, not available for the BCIII-IMT training and development, but available for the test collection as an additional resource. PDF, PubMed XML, and plain text formats are produced as for the other datasets. PPI-IMT Extra Data (BCII)

Note 1: The confidence scores MUST be a number between 0 and 1 (0,1+, excluding 0, but including 1) as specified in the submission guidelines. If the confidence score provided by a system is not between 0 and 1, the prediction will be considered as not valid.

Note 2: As specified in the format guidelines, each line in the your submission file (your predictions) should correspond to a single prediction for a single article. When you assign multiple MIs to a single article, each MI prediction must form a new line, and the line must start with the article's PMID. Make sure you are compliant with these formating guidelines.

Note 3: More than one method identifier might be associated with each paper. Multiple DIFFERENT methods for the same paper may occur due to the fact that the same interaction may be characterized/confirmed by several different techniques.

Note 4: For EACH DOCUMENT any MI identifier may be associated only ONCE. That is, only a SINGLE piece of evidence text should be provided for a PMID, MI pair. Ideally, this evidence text would be the one which constitutes the best evidence for the method. It is not allowed to predict the same MI multiple times for a single document.
So predictions like the following are NOT allowed:
18653891 MI:0006 1    0.91     [some evidence text 1]
18653891 MI:0006 2    0.81     [some evidence text 2]
18653891 MI:0006 3    0.71     [some evidence text 3]

Note 5: The evidence text you provide must be a contiguous string of text, and NOT a collection of multiple text spans taken from the whole article. It should also not contain newline characters or tabulators within it.

Note 6: As indicated in Note 5, each MI identifier can occur only once for a single article. Consequently, the assigned MIs per document are ranked in the range [1..N], where N is the total number of different method ids associated with a given article. (In this sense this task is similar to the interactor normalization task setting of BC2.5, with the exception that in this case instead of protein ids we ask for method ids (PSI-MI identifiers), and that we also ask for a single evidence text for human interpretation of predictions.

Note 7: If you still have doubts you can send us some example prediction you generate for the development set and we can cross check that the submission format is valid.


Same as for BCII.5, with both online and offline participation allowed, however online systems for all the tasks are preferred.

Number of runs (submissions per team)

We will allow the same number of runs per team as for BCII.5, i.e. 5 submissions per team for each of the Online and Offline version. This means that the max. number of runs for a given team are 5 Online runs and 5 Offline runs.

BCIII PPI Task organization

  • Martin Krallinger, CNIO
  • Florian Leitner, CNIO
  • Miguel Vazquez, CNIO
  • Alfonso Valencia, CNIO
  • Andrew Chatr-Aryamontri, BioGRID (University of Edinburgh)
  • Gianni Cesareni, MINT (University of Rome Tor Vergata)
  • Luana Licata, MINT (University of Rome Tor Vergata)
  • Livia Perfetto, MINT (University of Rome Tor Vergata)
  • Marta Iannuccelli, MINT (University of Rome Tor Vergata)
  • Leonardo Briganti, MINT (University of Rome Tor Vergata)