BioCreative - Track 5 - LitCovid track Multi-label topic classification for COVID-19 literature annotation

Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VII

Track 5 - LitCovid track Multi-label topic classification for COVID-19 literature annotation [2021-03-30]

Note for Biocreative participants: For registration to a track please use the Google form.
Do not use the team "Team page" tab as it is non functional.

LitCovid track Multi-label topic classification for COVID-19 literature annotation

The rapid growth of biomedical literature poses a significant challenge for manual curation and interpretation. This challenge has become more evident during the COVID-19 pandemic: the number of COVID-19-related articles in the literature is growing by about 10,000 articles per month. LitCovid, a literature database of COVID-19-related papers in PubMed, has accumulated more than 100,000 articles, with millions of accesses each month by users worldwide. LitCovid is updated daily, and this rapid growth significantly increases the burden of manual curation. In particular, annotating each article with up to eight possible topics, e.g., Treatment and Diagnosis, has been a bottleneck in the LitCovid curation pipeline.

This track calls for a community effort to tackle automated topic annotation for COVID-19 literature. Topic annotation in LitCovid is a standard multi-label classification task that assigns one or more labels to each article. These topics have been demonstrated to be effective for information retrieval and have been used in many downstream applications related to LitCovid. However, annotating these topics has been a primary bottleneck for manual curation. Increasing the accuracy of automated topic prediction in COVID-19-related literature would be a timely improvement beneficial to curators and researchers worldwide.

Training and development datasets

The training and development datasets contain the publicly-available text of over 30 thousand COVID-19-related articles and their metadata (e.g., title, abstract, journal). Articles in both datasets have been manually reviewed and articles annotated by in-house models

Evaluation dataset

Same as the training and development datasets, the evaluation dataset contains the articles that have been manually reviewed. Participants will return predictions for the entire set. Submissions will be evaluated using both label-based and instance-based metrics that are commonly applied for multi-label classification. Evaluation scripts will be provided.

Important dates:

Training and development set release: 15th June. The datasets can be accessed via here.
Evaluation script: 15th July. The evaluation script can be accessed via here.
Webinar: 9-10 am 22nd July EST. The slides and video can be accessed via here.
Test set release: 22th August. The testset can be accessed via here. The readme document in the same place provides more information on the test set. A sample prediction file example can be found via here
Test set prediction submission instructions: 2nd September. The instructions can be accessed via here (under the Submission instructions section).
Test set prediction submission due: 12th September
Test set evaluation returned to participants: 27th September
Short technical systems description paper due:13th October
Paper acceptance and review returned: late October
Test set with gold standard labels release: 9th December. The dataset (BC7-LitCovid-Test-GS.csv) and the updated readme file (BC7-LitCovid-Readme.pdf) can be accessed via here.

Task organizers:

Qingyu Chen, National Library of Medicine

Alexis Allot, National Library of Medicine

Rezarta Islamaj, National Library of Medicine

Robert Leaman, National Library of Medicine

Zhiyong Lu, National Library of Medicine

Contact

Please contact qingyu.chen@nih.gov with the subject heading "BioCreative Track 5 LitCovid questions" if you have any questions

Status updates and FAQs

The training and development datasets have been released. Please access the datasets here.

The evaluation script has been released. Please access here.

Webinar: 9-10 am EST 22th July. The slides and video can be accessed via here.

The test set (BC7-LitCovid-Test.csv) has been released. Please access the dataset here.

References

Chen Q., Allot A., & Lu Z. Keep up with the latest coronavirus research. Nature. 2020 Mar;579(7798):193-193.

Chen Q., Allot A., & Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Research. 2021 Jan 8;49(D1):D1534-40.

Chen, Q., Leaman, R., Allot, A., Luo, L., Wei, C. H., Yan, S., & Lu, Z. Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annual Review of Biomedical Data Science. 2021 Jul; 4.

Yeganova, L., Islamaj, R., Chen, Q., Leaman, R., Allot, A., Wei, C. H., ... & Lu, Z. Navigating the landscape of COVID-19 research through literature analysis: a bird's eye view. KDD. 2020.