RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VII

BioCreative VII challenge and workshop [2020-01-22]

BioCreative VII Challenge and Workshop CFP

The workshop will take place sometime during the first two weeks of November 2021. The dates and modality are not set yet, however there will be a virtual component.

BioCreative: Critical Assessment of Information Extraction in Biology is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreative has been an invaluable source for advancing state-of-the-art text mining methods by providing reference datasets and a collegial environment to develop and evaluate these methods in both shared and interactive modes. The sudden spread of COVID-19 has triggered an unexpected pressure on the biomedical community to quickly identify potential treatments by repurposing existing drugs or identifying new chemicals with anti-Sars-CoV-2 activity. Thus, BioCreative VII will focus around detection of chemicals, drugs and related substances with three tracks: Track 1 (DrugProt) focuses on the detection of interactions between chemicals/drugs/substances and genes/proteins in abstracts, Track 2: (NLM Chem track) focuses on detecting chemical names and their MeSH encoding in full-length articles and Track 3: Medications in Tweets focuses on extracting medication mentions from social media.
In addition, COVID-19 has triggered the development of multiple text mining tools to support ongoing research efforts that await community feedback. Thus, we are offering an interactive track, Track 4, to provide an environment for tools to be reviewed by users and get their feedback on utility and usability.

Here are more details about the tracks. Click on the Track number for accessing track specific pages:

  • Track 1- DrugProt:Text mining drug/chemical-protein interactions
    Organizers: Martin Krallinger, Alfonso Valencia
    DrugProt will explore recognition of chemical-protein entity relations from abstracts. The aim of the DrugProt track is to promote the development and evaluation of systems that are able to automatically detect relations between chemical compounds/drug and genes/proteins. We have therefore generated a manually annotated corpus, the DrugProt corpus, where domain experts have exhaustively labeled: (a) all chemical and gene mentions, and (b) all binary relationships between them corresponding to a specific set of biologically relevant relation types (DrugProt relation classes).

  • Track 2- NLM-Chem Track: Full text Chemical Identification and Indexing in PubMed articles
    Organizers: Rezarta Islamaj, Robert Leaman, and Zhiyong Lu, National Library of Medicine (NLM)
    Current chemical concept recognition tools have demonstrated significantly lower performance for in full-text articles than in abstracts. Improving automated full-text chemical concept recognition can substantially accelerate manual indexing and curation and advance downstream NLP tasks such as relevant article retrieval. The NLM-CHEM task will consist of two sub-tasks, focusing on (1) identifying chemicals in full-text articles (i.e. named entity recognition and normalization) and (2) ranking chemical concepts for full-text document indexing. The task will use the recently released NLM-CHEM corpus, consisting of 150 full-text articles, with ~5000 unique chemical names mapped to ~2,000 MeSH identifiers.

  • Track 3- Automatic extraction of medication names in tweets
    Organizers: Graciela Gonzalez-Hernandez, Davy Weissenbacher, Ivan Flores, Karen O’Connor
    The goal of this task is to extract the spans that mention a medication or dietary supplement in tweets. The dataset consists of all tweets posted by 212 Twitter users during their pregnancy. This data represents the natural and highly imbalanced distribution of drug mentions in Twitter, with only approximately 0.2% of the tweets mentioning a medication. Training and evaluating a sequence labeler on this data set will closely model the detection of drugs in tweets in practice. Click here for more information.

  • Track 4- COVID-19 text mining tool interactive demo (more info available soon)
    Organizers: Cecilia Arighi, Andrew Chatr-Aryamontri, Lynette Hirschman, Martin Krallinger, Karen Ross, Tonia Korves
    COVID-19 text mining tool interactive demo is a demonstration task, and will focus on tools specifically conceived to support COVID-19 research efforts. Similar to previous interactive tasks (e.g., PMID:27589961), tools will be reviewed by the research community, providing feedback on utility and usability. More details including minimum requirements will be made available soon.


    Teams can participate in one or more of these tracks. Team registration will continue until final commitment is requested by the individual tracks.
    To register a team go to the team registration page. If you have restrictions accessing Google forms please send e-mail to
    Note: The BioCreative site has a Team page link. Please ignore it. Registration is done via Google forms this time.


  • Cecilia Arighi, University of Delaware, USA
  • Andrew Chatr-Aryamontri, University of Montreal, Canada
  • Rezarta Dogan, National Center for Biotechnology Information (NCBI), NIH, USA
  • Graciela Gonzalez-Hernandez, University of Pennsylvania, USA
  • Lynette Hirschman, MITRE Corporation, USA
  • Martin Krallinger, Barcelona Supercomputing Center, Spain
  • Robert Leaman, National Center for Biotechnology Information (NCBI), NIH, USA
  • Zhiyong Lu, National Center for Biotechnology Information (NCBI), NIH, USA
  • Karen Ross, Georgetown University Medical School, USA
  • Alfonso Valencia, Barcelona Supercomputing Center, Spain
  • Davy Weissenbacher, University of Pennsylvania, USA
  • Back to top