BioCreative - Track 5- User Interactive Task (IAT)

Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative V

Track 5- User Interactive Task (IAT) [2014-12-18]

Organizers

Cecilia Arighi, Qinghua Wang and Lynette Hirschman

Introduction

From the beginning, the BioCreative Workshop organizers have worked closely with the biocuration community to understand the various curation workflows, the text mining (TM) tools employed in their pipelines, and the next major needs that could be addressed by the text mining community. In BioCreative Workshop 2012, descriptions of curation workflows from expert curated databases were reviewed [1] to identify commonalities and differences among these workflows. Compared to a survey conducted in 2009, the 2012 results show that many more databases are now using text mining in parts of their curation workflows [1,2]. Although text mining tools can be applied automatically to large corpora, these are not meant to be replacements for biocuration, but rather tools aiding in the biocuration task. Therefore, the interaction between users and text mining tools is a very important aspect contributing to the tools’ success and wide adoption by the biocuration community. To address potential barriers in using text mining tools by the biocuration community, the BioCreative team has conducted user requirement analyses and user-based evaluations, and has fostered standards development for system re-use and integration. In this respect, the BioCreative Interactive Text Mining Track (IAT) introduced in BioCreative III [3,4] has served as a means to observe the approaches, standards, and functionalities used by state-of-the-art text mining systems that have potential applications in biocuration. Additionally, the IAT track also provides a means for biocurators to be directly involved in the testing and improvement of text mining systems.

Task Description

The interactive track follows a format similar to previous IAT tracks, which have focused on exploring interactions between biocurators and web/API interfaces with a text mining back-end. The utility and usability of TM tools, as well as the generation of use cases, have been the focal point of the previous IAT tracks [3-5]. The tasks of the TM tools are user-defined and formally evaluated by biocurators. The activity is conducted remotely and results are reported at the BioCreative workshop. The track is open to any biocuration task; examples of previous biocuration tasks can be found in here. We ask participating teams to engage two end-users to assist in the development phase, dataset selection, and dataset annotation for evaluation purposes. The organizers, together with the User Advisory Group will choose appropriate biocurators for the testing phase.
We will ask interested teams to submit an email expressing their interest in participating on this task by January 31, 2015.

Interested participating teams should submit via email a document describing the system and the proposed biocuration task(s) by March 31, 2015, providing the URL to a functioning system, and addressing the following aspects:

a) Relevance and Impact:

b) User Interactivity:

Mandatory requirements:
a) System should indicate browser compatibility. At least be compatible with two of the following: Firefox, Chrome, Safari or Explorer.
b) System should highlight entities and relationships (if applicable) relevant to the annotation task. We encourage color coding of entity types and links to relevant data sources when appropriate.
c) User should be able to edit the text mining results by correcting errors or adding missing information, and should be able to export the corrected data.
d) Use standard input and output formats; if possible support more than one format type. For Input: For biomedical literature: document ID (e.g. PubMed IDs, or PubMed central IDs) and/or document text (e.g. text, html, pdf or XML). For other types of input use adopted standards when possible, which should be consulted with user community. For Output: Export the results at least in tab-delimited and XML formats. We also strongly encourage teams to adopt the BioC format (XML-based) .

Other strongly desired requirements (Optional):
a) Full-text processing at least the PubMed Central open access articles
b) Interactive disambiguation of domain entities.
c) Ability to filter/sort the results according to different criteria; rank results based on what is more relevant to the user.

Additional requirements (Optional):
a) On/off for text mining tool, allowing manual annotation in off mode.
b) Record time, be able to record time of curation session for each user (need log in as well)
c) Load curation suggestions or warnings for display during curation
d) Upload as input a gene list or ontology term list for focused curation.

c) System Performance:

Proposals will be chosen based on the relevance to the biocuration task and the reported performance of the text mining tool. Teams will be informed of acceptance by April 10, 2015.

Evaluation
In the IAT track activity, systems will be evaluated based on (i) performance (time-on-task and accuracy of the text mining-assisted task as compared to manual or some reference curation), and (ii) a subjective measure via a user questionnaire. See last survey as an example. The user survey will consist of six main topics: (1) overall reaction; (2) system’s ability to help complete tasks; (3) design of application; (4) learning to use the application; (5) usability of the system; and (6) recommendation of the system. These topics were selected based on those developed for the Questionnaire for User Interface Satisfaction (QUIS) developed by Chin et al. and shown to be a reliable guide to understanding user reactions [6]. There will be two levels of evaluation: one focused more on the overall system where the user will perform a real biocuration task so both the performance aspect and usability will be evaluated, the other focused on usability of the website by performing basic pre-defined tasks and reporting on success in achieving those tasks. In BioCreative V, we would like to propose conducting individual on-site usability tests with a small subset of users. This will allow the users to be observed and asked why they performed a given action.

User Recruitment
To make this activity successful, recruitment of users that are representative of the target community is essential. We expect that the teams are familiar with their user community and are able to recruit/commit two users to be involved throughout the developing and evaluation process. Committed users will help in the selection and annotation of dataset to be tested. The names of the users should be provided in the letter of intent. Additional users will be recruited for the evaluation phase by the organizers.

We hope that participation on this activity will expand the user adoption of text mining tools. We expect the participation in the IAT task to create new collaborations between text miners and users. The usability test will provide the participating teams with insight about their interfaces. The feedback can be utilized to improve the user-system experience. We expect that participating systems will increase the efficiency of curation, and subsequently be considered as a potential tool in the biocuration workflow.

Information for biocurators is available at http://www.biocreative.org/tasks/biocreative-v/iat-task-biocurators/

Track timeline

Note that timelines may be subject to change

January 31, 2015: Submission of letter of intent by teams. Send email to Cecilia Arighi, arighi@dbi.udel.edu with Subject:BC5_intent

March 31, 2015: Submission of system document by teams. Send email to Cecilia Arighi, arighi@dbi.udel.edu with Subject:BC5_system

April 10, 2015: Notification of accepted systems

April-May 2015: Developing individual systems and iterative system integration

Mid-June. 2015: Individual systems due date

June-July. 2015: Overall system evaluation

August 2015: Final paper deadline

September 2015: BioCreative V workshop in Spain

References

Database

Database (Oxford)

BMC Bioinformatics

Database (Oxford)

Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, Bethesda, MD, Vol. 1, pp. 190-203.

6. Chin, J.P., Diehl, V.A. and Norman, K.L. (1988) Development of an Instrument Measuring User Satisfaction of the Human-Computer Interface. Proceedings of ACM CHI Conference on Human Factors in Computing Systems, 213-218.

Downloads

SurveysTemplates_IAT2015