RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VII

BioCreative VII challenge and workshop [2020-01-22]

BioCreative VII Challenge and Workshop

September 2020, Barcelona, Spain

BioCreative: Critical Assessment of Information Extraction in Biology is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreative VII will focus around the detection of chemicals/drugs/substances in abstracts, full-length articles and social media. It will run the following tracks:

  • Track 1: NAME
    Organizers: Martin Krallinger

  • Track 2: BioCreative 2020 NLM-Chem Track
    Organizers: Zhiyong Lu, Rezarta Islamaj, Robert Leaman, National Library of Medicine (NLM)

    Automatic extraction of the chemicals mentioned in journal articles has the potential to translate to improvements in retrieving relevant articles and greatly speed up manual indexing and curation.

    Despite multiple attempts, current natural language processing (NLP) tools for chemical concept recognition show suboptimal performance when applied to full-text articles, due to the complexity of full text. We propose a new challenge task for BioCreative 2020, focusing on identifying and ranking chemicals in full text for document indexing. The proposed task will use the newly annotated NLM-CHEM corpus, which consists of 150 full text articles, with ~5000 unique chemical names mapped to ~2,000 MeSH identifiers.

  • Track 3: Automatic extraction of medication names in tweets
    Organizers: Graciela Gonzalez-Hernandez, Davy Weissenbacher, Ivan Flores, Karen O’Connor
    The goal of this task is to extract the spans that mention a medication or dietary supplement in tweets. The dataset consists of all tweets posted by 212 Twitter users during their pregnancy. This data represents the natural and highly imbalanced distribution of drug mentions in Twitter, with only approximately 0.2% of the tweets mentioning a medication. Training and evaluating a sequence labeler on this data set will closely model the detection of drugs in tweets in practice. Click here for more information.


    Teams can participate in one or more of these tracks. Team registration will continue until final commitment is requested by the individual tracks.
    To register a team go to the team registration page


  • Cecilia Arighi, University of Delaware, USA
  • Graciela Gonzalez-Hernandez, University of Pennsylvania, USA/li>
  • Lynette Hirschman, MITRE Corporation, USA
  • Martin Krallinger, Spanish National Cancer Centre, CNIO, Spain
  • Zhiyong Lu, National Center for Biotechnology Information (NCBI), NIH, USA
  • Alfonso Valencia, Spanish National Cancer Centre, CNIO, Spain
  • Davy Weissenbacher, University of Pennsylvania, USA
  • Back to top