RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VIII

Track 4: Clinical Annotation Tool Track [2023-01-22]

The BioCreative challenges have traditionally encouraged and supported the development of automated tools that improve the data extraction process according to the needs of biocurators, i.e. biological database experts. Recognizing the need for freely available, time-saving tools that support the creation of quality expert-curated resources, the goal of BioCreative 2023 Annotation Tool Track is to foster development of such annotation systems, with particular focus on clinical content processing.

Task description

A key aspect for the development of practically useful text mining and NLP tools, particularly in the case of specialized application domains such as biomedicine is to engage directly end users during the implementation stages. During the Interactive tracks of past BioCreative efforts, expert feedback and human in the loop scenarios were explored to enable direct interactions and feedback of biocurators (biological data curators) and developers of web-based text mining systems.

A considerable number of clinical NLP components have been developed over the past years, some of them in the context of shared tasks such as i2b2/n2c2 or CLEF. Moreover, in addition to annotation tools primarily used for processing biomedical literature, an increasing number of clinical NLP platforms like CogStack or cTakes are being used and developed. Thus, there is not only a clear need to promote the integration of existing individual clinical NLP components into annotation tools, but also the interactive testing, exploration, and definition of functional requirements to make such resources practically useful for clinical end users.

This track will follow a format similar to previous BioCreative Interactive tracks, which have focused on exploring interactions specifically between biocurators and web/API interfaces with a text mining back-end. The BioCreative VIII annotation tool track calls for demonstration of annotation tools that could facilitate the job of clinical and biological domain experts, offering seamless integration with relevant ontologies/vocabularies and other features to improve user experience and efficiency, with a focus on clinical data or data with clinical relevance. Selected tools will be evaluated remotely and results will be reported at a specific online event, the BioCreative workshop Jamboree Day. Selected teams will be invited to present at the BIoCreative Workshop at AMIA.

Annotation tool submissions can be new annotation tools, or improvements upon previously existing annotation tools (Example open source annotation tools and example tool functionality listed below).

This track will consist of two subtracks:

  • Annotation tools facilitating clinical data structuring: these tools are capable to integrate a full ontology or vocabulary, such as MEDRA, ICD10, SNOMED CT, RxNorm, ATC code etc., or for biomedical domains Gene, Uniport, Disease Ontology, MeSH, etc., and allow full user interaction with this ontology/vocabulary.
  • Annotation tools facilitating clinical variable extraction: these tools are capable of focusing on the annotation of a provided list of concept identifiers from a list of predefined terminologies (i.e. SNOMED CT, MeSH ids, or alike), and allow the user to interact with the selected ontology/vocabulary for the annotation of the selected topic concept (i.e. hypertension and all its sub-terms in SNOMED CT)

Teams who are interested in participating in this track should express their interest HERE by May 31st, 2023.

Annotation Tool Requirements

  • Public availability, and ability to run from a website, OR availability of local setup to allow for data with privacy concerns, such as clinical records
  • Interoperability -- all systems should be able to take as input data of a given format and allow easy download of data in the given format, so that results can be compared, combined, and evaluated. (Example formats linked below)
  • Support of entity annotation and normalization to at least one ontology/standardized vocabulary, providing efficient curation through effective human-machine interaction (Example ontologies linked below)
  • Able to support documents added locally (clinical records, patent records, other local documents) or journal articles retrieved from PubMed/PMC

Entity Annotation and Normalization

The annotation tool must support entity level annotation:

  • Ability to create new annotation, edit existing annotation, delete existing annotation, and apply the same annotation for all occurrences in the given document
  • Ability to highlight text (ability to annotate words, parts of words, and/or sentences, as needed)

The annotation tool must support entity level normalization:

  • Ability to programmatically access the selected ontology/vocabulary,
  • Provide searching capabilities/browsing for the ontology/vocabulary terms and support for selection of the ontology term for the annotators
  • Provide suggestions to the curator for the selected ontology/vocabularies.

Suggested Ontologies/Vocabularies:

The tool should interact with at least one of these ontologies/vocabularies: ICD10, MedRA, HPO, MeSH, Gene, SNOMED CT, UMLS, LoinC, RxNorm, ATC codes. Please contact the organizers with other suggestions.

Data Formats:

The tool should support at least one of these data formats: BioC, BRAT, …

Examples of open source annotation tools:

Teams could select any of the existing open source annotation tools and add new features and functionalities. Examples of existing tools: TeamTat, BRAT, Markyt, …

Evaluation

Each submission will be evaluated by at least three distinct users ( a committee member, another fellow workshop participant, and a selected domain expert), who will review the submitted materials, test the tool, and fill an evaluation form.

Proposed Timeline Steps

  • April 15 - Announcement and invitation for participation
  • May 31st - Team registration ends. Upon registration, teams will receive sample data from each of the three datasets from BioCreative VIII, to help them get started.
  • June - Mid-way check-in. (Zoom meeting to answer any questions, link and exact date will be provided)
  • Early September -- System description submission
  • Late September -- Evaluation results returned
  • Late September - Final Video submission to be included in the Youtube playlist
  • TBD - Virtual Annotation Day Jamboree
  • November - Select tools invited to present at the workshop at AMIA

System Submission Requirements

Participating teams will be asked to prepare a 2-page document describing:

  • the proposed system and the tasks that it can perform,
  • mention whether it is a new tool, or improvement upon an existing system,
  • provide a URL for accessing the system (it does not need to be the final version), and
  • address how the tool fulfills the annotation tool requirements:
    • Tool Relevance and Impact: The teams should clearly describe: i) the target user community; ii) interoperability (e.g., input and output formats, standards adopted); iii) example use cases for the application.
    • Data annotation via the selected ontologies/vocabularies (e.g. how the tool provides efficient annotation)
    • User Interactivity (e.g. how the user interface helps save annotator time via functionality such as highlighting, sorting, filtering, editing, and finalizing results).
  • data sample and an annotation scenario to be used in closed evaluation (see below for evaluation)
  • a (link to) video presentation describing the tool (max 5 minutes)

The web server must be functional, contain help pages or tutorials, and browser compatibility must be indicated.

Tool Submissions will be chosen based on their relevance to the clinical and biocuration domains and the reported maturity of the system.

Send your submission via email to biocreativechallenge_AT_gmail.com with Subject: TRACK4 proposal. Please make sure to indicate your name, team name, and your team member names.

Track Organizers

  • Rezarta Islamaj, National Library of Medicine
  • Cecilia Arighi, University of Delaware
  • Lynette Hirschman, MITRE
  • Graciela Gonzalez, Cedars-Sinai Medical Center
  • Martin Krallinger, Barcelona Supercomputing Center
  • Zhiyong Lu, National Library of Medicine