RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VII

BioCreative VII challenge and workshop (Events) [2020-01-22]

Note for Biocreative participants: For registration to a track please use the Google form.
Do not use the team "Team page" tab as it is non functional.

BioCreative VII Challenge and Workshop CFP

The workshop will take place on November 8-10, 2021. This workshop will be virtual.

BioCreative: Critical Assessment of Information Extraction in Biology is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreative has been an invaluable source for advancing state-of-the-art text mining methods by providing reference datasets and a collegial environment to develop and evaluate these methods in both shared and interactive modes. The sudden spread of COVID-19 has triggered an unexpected pressure on the biomedical community to quickly identify potential treatments by repurposing existing drugs or identifying new chemicals with anti-Sars-CoV-2 activity. Thus, BioCreative VII will focus around detection of chemicals, drugs and related substances with three tracks: Track 1 (DrugProt) focuses on the detection of interactions between chemicals/drugs/substances and genes/proteins in abstracts, Track 2: (NLM Chem track) focuses on detecting chemical names and their MeSH encoding in full-length articles and Track 3: Medications in Tweets focuses on extracting medication mentions from social media.
In addition, COVID-19 has triggered the development of multiple text mining tools to support ongoing research efforts that await community feedback. Thus, we are offering an interactive track, Track 4, to provide an environment for tools to be reviewed by users and get their feedback on utility and usability. We further offer Track 5, LitCovid Track on multi-label topic classification for COVID-19 literature annotation, calling for innovative text mining tools to support the curation of COVID-19 literature in LitCovid, a literature database of COVID-19-related papers in PubMed.

Here are more details about the tracks. Click on the Track number for accessing track specific pages:

  • Track 1- DrugProt:Text mining drug/chemical-protein interactions
    Organizers: Martin Krallinger, Alfonso Valencia
    DrugProt will explore recognition of chemical-protein entity relations from abstracts. The aim of the DrugProt track is to promote the development and evaluation of systems that are able to automatically detect relations between chemical compounds/drug and genes/proteins. We have therefore generated a manually annotated corpus, the DrugProt corpus, where domain experts have exhaustively labeled: (a) all chemical and gene mentions, and (b) all binary relationships between them corresponding to a specific set of biologically relevant relation types (DrugProt relation classes).

  • Track 2- NLM-Chem Track: Full text Chemical Identification and Indexing in PubMed articles
    Organizers: Rezarta Islamaj, Robert Leaman, and Zhiyong Lu, National Library of Medicine (NLM)
    Current chemical concept recognition tools have demonstrated significantly lower performance for in full-text articles than in abstracts. Improving automated full-text chemical concept recognition can substantially accelerate manual indexing and curation and advance downstream NLP tasks such as relevant article retrieval. The NLM-CHEM task will consist of two sub-tasks, focusing on (1) identifying chemicals in full-text articles (i.e. named entity recognition and normalization) and (2) ranking chemical concepts for full-text document indexing. The task will use the recently released NLM-CHEM corpus, consisting of 150 full-text articles, with ~5000 unique chemical names mapped to ~2,000 MeSH identifiers.

  • Track 3- Automatic extraction of medication names in tweets
    Organizers: Graciela Gonzalez-Hernandez, Davy Weissenbacher, Ivan Flores, Karen O’Connor
    The goal of this task is to extract the spans that mention a medication or dietary supplement in tweets. The dataset consists of all tweets posted by 212 Twitter users during their pregnancy. This data represents the natural and highly imbalanced distribution of drug mentions in Twitter, with only approximately 0.2% of the tweets mentioning a medication. Training and evaluating a sequence labeler on this data set will closely model the detection of drugs in tweets in practice. Click here for more information.

  • Track 4- COVID-19 text mining tool interactive demo
    Organizers: Cecilia Arighi, Andrew Chatr-Aryamontri, Lynette Hirschman, Martin Krallinger, Karen Ross, Tonia Korves
    The COVID-19 text mining tool interactive demo track is a demonstration task, and will focus on tools specifically developed to support COVID-19 research efforts. Similar to previous interactive tasks (e.g., PMID:27589961), tools will be reviewed by the research community, providing feedback on effectiveness and usability.
    The goal of this task is to foster the interaction between system developers and potential users to advance in the development of text mining tools that are useful for the research community. Participating teams will present a web-based system that can address some task(s) of their choice. Users will be recruited to review the system and provide feedback via a user questionnaire. More information here.

  • Track 5- LitCovid track Multi-label topic classification for COVID-19 literature annotation
    Organizers: Qingyu Chen, Alexis Allot, Rezarta Islamaj, Robert Leaman, and Zhiyong Lu, National Library of Medicine (NLM)
    The number of COVID-19-related articles in the literature is growing by about 10,000 articles per month. LitCovid, a literature database of COVID-19-related papers in PubMed, has accumulated more than 100,000 articles, with millions of accesses each month by users worldwide. LitCovid is updated daily, and this rapid growth significantly increases the burden of manual curation. In particular, annotating each article with up to eight possible topics, e.g., Treatment and Diagnosis, has been a bottleneck in the LitCovid curation pipeline. Increasing the accuracy of automated topic prediction in COVID-19-related literature would be a timely improvement beneficial to curators and researchers worldwide. The LitCovid track calls for a community effort to tackle automated topic annotation for COVID-19 literature. The task will use ~60K articles in LitCovid with manually reviewed topics.


    The BioCreative VII Proceedings will host all the submissions from participating teams and it will be freely available by the time of the workshop.
    In addition, we are happy to announce that the journal Database will host the BioCreative VII special issue for work that has passed their peer-review process. Invitation to submit will be sent after the workshop.


    Teams can participate in one or more of these tracks. Team registration will continue until final commitment is requested by the individual tracks.
    To register a team go to the Registration form. If you have restrictions accessing Google forms please send e-mail to
    Note: The BioCreative site has a Team page link, please ignore it as it is non functional. Registration is done via Google forms this time.


  • Cecilia Arighi, University of Delaware, USA
  • Andrew Chatr-Aryamontri, University of Montreal, Canada
  • Rezarta Dogan, National Center for Biotechnology Information (NCBI), NIH, USA
  • Graciela Gonzalez-Hernandez, University of Pennsylvania, USA
  • Lynette Hirschman, MITRE Corporation, USA
  • Martin Krallinger, Barcelona Supercomputing Center, Spain
  • Robert Leaman, National Center for Biotechnology Information (NCBI), NIH, USA
  • Zhiyong Lu, National Center for Biotechnology Information (NCBI), NIH, USA
  • Karen Ross, Georgetown University Medical School, USA
  • Alfonso Valencia, Barcelona Supercomputing Center, Spain
  • Davy Weissenbacher, University of Pennsylvania, USA
  • Back to top


    ChemProt corpus: BioCreative VI (Resources) [2017-11-21]

    Text mining chemical-protein interactions (CHEMPROT) corpus, including:
    • ChemProt sample set
    • ChemProt training set
    • ChemProt development set
    • ChemProt test set


    BioCreative VI


    BioCreative VI Workshop Registration and Information (Events) [2017-07-05]

    The BioCreative VI workshop will be held at DoubleTree by Hilton Hotel Bethesda - Bethesda, Maryland during October 18-20, 2017.


    Registration to BioCreative VI Workshop is now closed

    Registration includes:

  • Breakfast
  • Attendance to workshop
  • Coffee breaks
  • Lunch
  • Tour to the National Library of Medicine
  • Registration is in US dollars:

    Registration TypeEarly fee (until September 18)Late fee (after September 18)

    Back to top

    Travel Awards

    Funds are available for US participants for the amount up to $700 to participate in the BioCreative workshop. To apply complete the application available here by September 1st. Women, under-represented minorities, students, and post-doctoral fellows are encouraged to apply. Submissions are now closed

    Back to top

    Scientific Program

    Venue: DoubleTree by Hilton Hotel, Bethesda, Maryland
    Talks: Ballroom D (2nd floor)
    Posters: Balance room (2nd floor)

    The scientific program includes the talks related to the individual tracks, a panel about Innovation in biomedical digital curation, a general session for BioCreative related topics, 2 keynote talks and a poster session. Detailed agenda is shown below

    Wednesday, October 18, 2017

    08:00 - 12:00Registration (EMC foyer, 2nd floor)
    08:00 - 09:00Breakfast (EMC foyer, 2nd floor)
    09:00 - 09:15Workshop opening PDF
    09:15 - 10:40TRACK 1 Interactive Bio-ID Assignment
    9:15-9:25 Introduction to Bio_ID track (Cecilia Arighi, University of Delaware) PDF
    9:25-9:40 Introduction to SourceData and data set (Thomas Lemberger, SourceData) PDF
    9:40-9:55 Overview of Bio_ID results – batch results (Lynette Hirschman, MITRE) PDF
    9:55-10:05 A Neural Named Entity Recognition Approach to Biological Entity Identification (Emily Sheng, Information Sciences Institute/USC) PDF
    10:05-10:15 A Study on Identification of Organism and micro-RNA Mentions in Figure Captions (Po-Ting Lai, National Central University) PDF
    10:15-10:40 Next steps and discussion
    10:40 - 11:00Break
    11:00 - 12:30Panel on Innovation on Digital Curation (Moderators: Fabio Rinaldi and Cecilia Arighi)
    11:00-11:10 Julio Collado-Vides, UNAM- Natural language processing to enhance accessibility to knowledge in RegulonDB
    11:10-11:20 Thomas Lemberger, EMBO- Data transparency in scientific publishing PDF
    11:20-11:30 Zhiyong Lu, NCBI- Text mining for improving the prioritization, curation, and integration of knowledge for clinically relevant variants PDF
    11:30-11:40 Johanna McEntyre, EMBL-EBI- How can text mining scale to meet diverse and precise curation needs? PDF
    11:40-12:30 Open discussion
    12:30 - 13:40Lunch (Restaurant first floor)
    13:40 - 15:40TRACK 4 Mining protein interactions and mutations for precision medicine
    1:40-2:10 Overview of the Precision Medicine Track (Rezarta Islamaj Dogan) PDF
    2:10-2:30 Identifying Relevant Literature for Precision Medicine Using Deep Neural Networks (Sergio Matos)
    2:30- 2:50 Exploring a Deep Learning Pipeline for the BioCreative VI Precision Medicine Task (Tung Tran)
    2:50- 3:10 Mining protein interactions affected by mutations using a NLP based machine learning approach (Albert Steppi and Jinchan Qu)
    3:10-3:30 Document Triage and Relation Extraction for Protein-Protein Interactions affected by Mutations (Dina Demner Fushman for Karin Verspoor and team)
    3:30-3:40 Poster spotlight and Open discussion
    15:40 - 16:00Break
    16:00 - 17:00Keynote- Dr. Patricia Flatley Brennan, Director of National Library of Medicine, NIH
    Towards a future of data-powered health Zip file
    17:00 - 18:00Panel on Funding Stakeholders (Moderator: Cathy Wu)
    Susan Gregurick, NIGMS, NIH PDF
    Jennifer Weller, NSF
    Jane Ye, NLM, NIH PDF
    Johanna McEntyre, EMBL-EBI

    Thursday, October 19, 2017

    08:00 - 12:00Registration (EMC foyer, 2nd floor)
    08:00 - 09:00Breakfast (EMC foyer, 2nd floor)
    09:00 - 10:15General session
    9:00-9:20 Efficient and Accurate Entity Recognition for Biomedical Text (Fabio Rinaldi, U. Zurich)
    9:20-9:40 iTextMine: Integrated Text Mining System for Large-Scale Knowledge Extraction from Literature (Jia Ren, U. Delaware)
    9:40-10:00 Large-scale Automated Reading with Reach Discovers New Cancer Driving Mechanisms (Dane Bell, U. Arizona)
    10:00-10:20 Evaluating without a Gold Standard (Lynette Hirschman, MITRE)
    10:20 - 10:40Break
    10:40 - 12:30 TRACK 2 Text-mining services for Kinome Curation
    10:40-11:10 Kinome Track Overview (Julien Gobeill & Patrick Ruch, SIB)
    11:10-11:35 Assisting Document Triage for Human Kinome Curation via Machine Learning (Alan Hsu, NCBI)
    11:35-11:55 KinDER: A Biocuration Tool for Extracting Kinase Knowledge from Biomedical Literature (Adam Morrone & Daniel Dopp, Montana State University)
    11:55-12:20 Discussion
    12:30 - 13:45Lunch (Restaurant first floor)
    13:45 - 15:45 TRACK 3 Extraction of causal network information using the Biological Expression Language
    1:45 - 2:15 BEL Track - Overview and Results (Sumit Madan, Fraunhofer SCAI, Germany) PDF
    2:15 - 2:30 Task 1 - BELMiner – Information extraction system to extract BEL relationships (Ravikumar Komandur Elayavilli, Mayo Clinic, USA) PDF
    2:30 - 2:45 Task 1 - Generating Biological Expression Language Statements with Pipeline Approach and Different Parsers (Po-Ting Lai, National Central University, Taiwan) PDF
    2:45 - 3:00 Task 1 - Automatic Extraction of BEL-Statements based on Neural Networks (Mehdi Ali, University of Bonn, Germany) PDF
    3:00 - 3:15 Task 2 - Semantic Information Retrieval: Exploring dependency and word embedding features in biomedical Information Retrieval (Majid Rastegar-Mojarad, Mayo Clinic, USA) PDF
    3:15 - 3:30 BEL Track - Next steps and discussion (Sumit Madan, Fraunhofer SCAI, Germany) PDF
    15:45 - 16:00Break
    16:00 - 17:00Keynote- Dr. Hongfang Liu, Professor Biomedical Informatics, Mayo Clinic
    Text mining in precision medicine: opportunities and challenges
    17:00 - 19:00Poster session and Reception

    Friday, October 20, 2017

    08:00 - 09:00Breakfast (EMC foyer, 2nd floor)
    08:30 - 10:30TRACK 5 Text mining chemical-protein interactions
    8:30-9:00 Overview of the Chemical-Protein relation extraction track (Martin Krallinger / Saber A. Akhondi (CNIO / Elsevier) PDF
    9:00-9:15 Chemical-protein relation extraction with SVM, CNN, RNN and ensemble systems (Yifan Peng, NCBI, NLM, NIH) PDF
    9:15-9:30 Extracting Chemical-Protein Interactions using Long Short-Term Memory Networks (Sérgio Matos, University of Aveiro, Portugal) PDF
    9:30-9:45 Attention based Neural Networks for Chemical Protein Relation Extraction (Ravikumar Komandur Elayavilli, Mayo Clinic, USA)
    9:45-10:00 Extracting protein-chemical compound interactions from literature (Pei-Yau Lung, Florida State University)
    10:00-10:15 Knowledge-base-enriched relation extraction (Ignacio Tripodi, University of Colorado, Boulder) PDF
    10:15 - 10:30 CTCPI - Convolution Tree Kernel-based Chemical-Protein interaction detection (Po-Ting Lai, National Tsing-Hua University, Hsinchu, Taiwan)
    10:30 - 10:45Break
    10:45 - 12:00Tracks open discussion/closing
    12:00 - 13:00Lunch (Restaurant first floor)
    13:00 - 15:30Tour to the National Library of Medicine

    Back to top

    Work Submissions

    Submission of abstracts/papers should be done via Easychair at the following URL: Format for submission:PDF.

    BioCreative VI proceedings will be published online and will be available at the time of the meeting.

    We will invite selected works for full publication in the Database Journal Special Issue for BioCreative.

    Track presentation submissions: Track participants should submit their work in the form of a short paper (up to 4 pages). See template with instructions here. Although template is a word document, the format for submission should be in PDF.

    Submission deadline: please follow the dates described in the track you are participating

    General session submissions: BioCreative VI will host a general session to provide an opportunity to discuss related work to BioCreative topics, including data set preparation, evaluation, biomedical information extraction. Then, we are soliciting submission of your bioNLP work that is related to any of these topics in the form of a 2 page abstracts (see template with instructions can be found here). Although template is a word document, the format for submission should be in PDF. Abstracts will be selected for oral presentation during the corresponding session.

    Submission deadline September 17, 2017

    Poster size 32"x40"

    Back to top

    Venue and Access

    Conference venue | Map

    DoubleTree by Hilton Hotel Bethesda - Washington DC
    8120 Wisconsin Avenue, Bethesda, Maryland 20814
    Phone: (301) 652-2000


    There are a number of rooms reserved for this event at the DoubleTree hotel with a special rate of US$219 per night. This special rate is available through September 22, 2017.
    Guests can go online to the BioCreative hotel page reservation (Preferred)
    Guests can call 1-800-955-7359 and request the group rate for the BioCreative VI Conference or the Group SRP/Code: BC0.
    Go to the hotel main page, click on the Reservations Tab, enter dates and on the Special Accounts Section enter BC0 for the Group/Convention Code.
    The special rate applies for the nights 10/17/2017 through 10/20/2017.


    Map & Directions to the hotel from local airports:

    The Doubletree Hotel in Bethesda is conveniently located near 3 airports. It is just 15 miles from Ronald Reagan Washington National Airport (DCA), 27 miles from Washington Dulles International Airport (IAD), and 38 miles from Baltimore Washington International Airport (BWI). All three airports should have private shuttle services available. Fees for these services vary based on distance to the hotel.

    Transportation options from DCA
    Transportation options from IAD
    Transportation options from BWI

    Maps and directions


    The hotel offers paid parking (rate $12 for day parking and $23 for overnight). There are also meter spaces and public parking throughout Bethesda area. To check parking places and rates click here

    Back to top


    You are responsible to inquire about visa requirements to enter the US. General information about US visas can be found here. If you need a letter of invitation to attend the workshop please send email to with subject: visa letter request.

    Back to top


    BioCreative workshop is sponsored by

  • NIH/MIGMS Conference grant 1R13GM109648-01A1
  • NIH/MIGMS Supplement 2R01GM08064
  • Database, Oxford University Press
  • International Society for Biocuration
  • Elsevier
  • OpenMinted (654021) H2020 project
  • encomienda MINETAD-CNIO/OTG Sanidad Plan TL
  • Back to top

    BioCreative VI Workshop Proceedings

    The proceedings is now available here

    Back to top