RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative V

IAT (Task 5) For Biocurators [2015-05-08]

Organizers: Cecilia Arighi, Qinghua Wang, and Lynette Hirschman

Table of Contents


A common problem faced by biocurators when using text mining systems is that these are difficult to use or do not provide an output that can be directly exploited by biocurators during their literature curation process. In this respect, the BioCreative Interactive Text Mining (IAT) task has served as a great means to observe the approaches, standards and functionalities used by state-of-the-art text mining systems with potential applications in the biocuration domain. The IAT task also provides a means for biocurators to be directly involved in the testing of text mining systems. For the upcoming BioCreative V, seven teams have submitted a text mining/NLP system targeted to a specific biocuration task. These systems will be formally evaluated by users, but not competitively.

Invitation to Participate

This is an open invitation to biocurators to participate in a user study on the system of their choice during the period June 22 to July 31 prior to the BioCreative V workshop (September 9-11, 2015). This study is conducted remotely and is time flexible. There are two levels of participation: full (total commitment time of approximately 12h per system) which involves training, performing pre-defined tasks, curating a set of documents, and completing a user survey; and partial (total commitment time of approximately 30 min to 1 h per system) which involves performing basic pre-defined tasks at the system's website, and providing feedback via a user survey. A brief description of participating text mining systems can be found in the section Description of participating systems. Note that only a brief description of each system is shown. A link to system website is provided although the final version to be used in the biocuration task may not be ready yet. The systems will provide guidelines, tutorials, and training by the time the evaluation period starts June 22, and results from the user evaluation should be returned by Friday July 31. To register for participation please click here. We would appreciate if you could sign up by June 20.

The results of this evaluation will be presented during the BioCreative V workshop, September 9-11, 2015 in Sevilla, Spain. Attendance to the workshop is not a prerequisite to participate, but you are more than welcome to do so.

Back to top

Why should I participate?

The benefits to biocurators participating in this activity are multifold, including:

  • direct communication and interaction with developers
  • exposure to new text mining tools that can be potentially adapted and integrated into the biocuration workflow
  • contribution to the development of text mining systems that meet the needs of the biocuration community
  • dissemination of findings in Proceedings and a peer-reviewed journal article.

What does it take to participate?

As described before there are two levels of participation:

1-Participation in an actual biocuration task (Full): includes a training session to get familiar with the annotation guidelines and the website functionality via remote interaction with text mining groups, and a testing session that will involve performing pre-defined tasks, and an annotation task (text mining (TM)-assisted and non-TM assisted). The dataset selected could be a set of articles that is relevant to the biocurator's domain of expertise (to be discussed with system developers). We expect that the time commitment for this activity will be around 2h per week in the period June 22 to July 31 (12h total). We provide below a table of activities and timeline for your guidance. As mentioned before, time is flexible with the only requirement of finishing the task by July 31. Participation in this activity entitles you to be listed as co-author in the Overview BioCreative IAT manuscript Proceedings (upon your agreement).

Week 1    Training with guided exercises with text mining team
Week 2    Review of task guidelines with text mining team and coordinator
Week 3    Pre-defined tasks exercise
Week 4    1h annotation exercise (non-TM assisted)/1h annotation exercise (TM-assisted)
Week 5    1h annotation exercise (non-TM assisted)/1h annotation exercise (TM-assisted)
Week 6    Survey and submission of data

2-Participation in testing the usability of the website by performing basic pre-defined tasks (Partial): The user will navigate the system to perform such tasks and will report on usability of the website via a user survey. The user won’t be trained here, but only use help support provided by the website.

Back to top

How do I register for participation?

Please check the system(s) that you are interested in (from the list below) and fill in the information required in the form. To learn more about the systems, go to the section Description of participating systems. You can always contact us if you have additional questions by sending email to Biocreative IAT.

Please select one or more systems from this list:
Argo (Curation of phenotypes relevant to the chronic obstructive pulmonary disease (COPD) in the PhenomeNet database)
Egas (Identification of clinical attributes associated with human inherited gene mutations, described in PubMed abstracts)
OntoGene (Curators interested in bioconcepts currently supported by OntoGene, e.g., miRNA, gene, chemical, disease)
GenDisFinder (Knowledge discovery of known/novel human gene-disease associations from biomedical literature)
MetastasisWay (Look for the biomedical concepts and relations associated with metastasis and construct the metastasis pathway)
BELIEF (A semi-automated curation interface which supports expert in relation extraction and encoding in the modelling language BEL (Biological Expression Language) )
EXTRACT (List the environment type and organism name mentions identified in a given piece of text)

Please specify if you would like to be considered for Full Participation or Partial Participation for the systems selected, as explained in the above section. We would appreciate registration by June 20.

(Name *)
(Email *)
(Organization *)
* Mandatory

Back to top

Description of Participating Systems

Below is a list of the systems that are participating in the upcoming BioCreative V Interactive task. Please note that the systems will be ready for testing by mid June, therefore, some websites may not be yet ready with all functionalities, or may present an overview of the system. Also note that the information in the table is limited to that pertaining to the task proposed for the interactive activity.

To participate in the user study involving one or more of these systems, please go to Registration.

Argo: Curation of phenotypes relevant to the chronic obstructive pulmonary disease (COPD) in the PhenomeNet database.
BioconceptsStandardsCategorizationInformation ExtractionTextBrowserTarget User
-medical condition
-OMIM (gene/protein)
-Human Disease Ontology (medical condition)
-UMLS (sign/symptom)
-STITCH (drug)
N/A -disease-disease relations
-drug-disease relations
-gene-disease relations
-sign or symptom-disease relations
full-textChrome, Firefox, SafariCurators of phenotypic concepts
Egas: Identification of clinical attributes associated with human inherited gene mutations, described in PubMed abstracts
BioconceptsStandardsCategorizationInformation ExtractionTextBrowserTarget User
-Human Phenotype Ontology (HPO)
-NCI Thesaurus
N/A-gene/protein-mutation relation
-gene/protein-disease relation
-mutation-zygosity relation
-mutation-penetrance relation
abstractChrome, Firefox, SafariCurators of genetic variants
Ontogene: Curation of bioconcepts, such as miRNA, gene, disease and chemicals and their relations.
BioconceptsStandardsCategorizationInformation ExtractionTextBrowserTarget User
-RegulonDB ID
-CTD (gene, chemical, disease)
-NCBI taxonomy (species)
N/Afull-textChrome, Firefox, SafariCurators interested in bioconcepts currently supported by OntoGene
GenDisFinder: Knowledge discovery of known/novel human gene-disease associations from biomedical literature.
BioconceptsStandardsCategorizationInformation ExtractionTextBrowserTarget User
-EntrezGene (gene/protein)
-OMIM (disease phenotype)
Gene Disease Associations (GDA) into novel, known or unknowngene/protein-disease relations
GDA-related action words and network association type
abstractChrome, Explorer, Firefox, Opera, Safari Gene Disease association related studies and databases, Pharma companies
MetastasisWay: Look for the biomedical concepts and relations associated with metastasis and finally construct the metastasis pathway.
BioconceptsStandardsCategorizationInformation ExtractionTextBrowserTarget User
-body part
-gene expression
-cell line
-experimental techniques
NCBI Gene IDs (gene/protein)N/Apositive and negative regulations of biomedical concepts associated with metastasisabstractChrome, Firefox The target user community can be anyone that hopes to discover the relationship between gene/gene product and metastasis through text mining
BELIEF: Semi-automated curation interface which supports expert in relation extraction and encoding in the modeling language BEL (Biological Expression Language ). BEL can represent biological knowledge in causal and correlative relationships that are triples containing a subject, a predicated (relationship) and an object.
BioconceptsStandardsCategorizationInformation ExtractionTextBrowserTarget User
-gene/protein (human, mouse, rat)
-biological processes
-HGNC (HUGO Gene Nomenclature Committee)
-MGI (Mouse Genome Database
-MeSH Diseases (C) Branch
-ChEBI (Chemical Entities of Biological Interest)
- GO-Biological Process
-Selventa Protein/Family Names
N/ARelations are expressed in BEL (Biological Expression Language). Relations can be expressed between all of the detected entity typesabstracts, full text can be used as well (as plain text, UTF-8 encoding)Chrome, FirefoxCurators with a basic understanding of BEL, or users who are interested in coding relationships and want to try encoding relationships in BEL
EXTRACT: Lists the environment type and organism name mentions identified in a given piece of text.
BioconceptsStandardsCategorizationInformation ExtractionTextBrowserTarget User
-environment descriptive terms (e.g. lagoon, desert, forest)
-organism mentions
-Environment Ontology classes (a Genomics Standards Consortium effort)
-NCBI Taxonomy entries
N/AN/Atext snippetsChrome, Firefox metagenomics record curators, microbial/molecular ecology researchers (e.g. upon sample/sequence annotation), metagenomics resource developers (e.g. metadata suggestion module)

Back to top

Travel awards

Travel awards to attend the BioCreative workshop are available for interactive task participants from US-based institutions. For details click here

Back to top