RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative V

Track 5- User Interactive Task (IAT) [2014-12-18]


    Cecilia Arighi, Qinghua Wang and Lynette Hirschman


From the beginning, the BioCreative Workshop organizers have worked closely with the biocuration community to understand the various curation workflows, the text mining (TM) tools employed in their pipelines, and the next major needs that could be addressed by the text mining community. In BioCreative Workshop 2012, descriptions of curation workflows from expert curated databases were reviewed [1] to identify commonalities and differences among these workflows. Compared to a survey conducted in 2009, the 2012 results show that many more databases are now using text mining in parts of their curation workflows [1,2]. Although text mining tools can be applied automatically to large corpora, these are not meant to be replacements for biocuration, but rather tools aiding in the biocuration task. Therefore, the interaction between users and text mining tools is a very important aspect contributing to the tools’ success and wide adoption by the biocuration community. To address potential barriers in using text mining tools by the biocuration community, the BioCreative team has conducted user requirement analyses and user-based evaluations, and has fostered standards development for system re-use and integration. In this respect, the BioCreative Interactive Text Mining Track (IAT) introduced in BioCreative III [3,4] has served as a means to observe the approaches, standards, and functionalities used by state-of-the-art text mining systems that have potential applications in biocuration. Additionally, the IAT track also provides a means for biocurators to be directly involved in the testing and improvement of text mining systems.

Task Description

The interactive track follows a format similar to previous IAT tracks, which have focused on exploring interactions between biocurators and web/API interfaces with a text mining back-end. The utility and usability of TM tools, as well as the generation of use cases, have been the focal point of the previous IAT tracks [3-5]. The tasks of the TM tools are user-defined and formally evaluated by biocurators. The activity is conducted remotely and results are reported at the BioCreative workshop. The track is open to any biocuration task; examples of previous biocuration tasks can be found in here. We ask participating teams to engage two end-users to assist in the development phase, dataset selection, and dataset annotation for evaluation purposes. The organizers, together with the User Advisory Group will choose appropriate biocurators for the testing phase.
We will ask interested teams to submit an email expressing their interest in participating on this task by January 31, 2015.

Interested participating teams should submit via email a document describing the system and the proposed biocuration task(s) by March 31, 2015, providing the URL to a functioning system, and addressing the following aspects:

    a) Relevance and Impact: The teams are asked to describe: i) the targeted user community, and the needs and standards used by this community; ii) the task proposed and how it aligns with the guidelines set by the targeted user community; and iii) use cases for the application.
    b) User Interactivity: The teams are required to provide web-based text mining systems with user interactivity, such as highlighting, editing, correcting, sorting, and/or exporting results. There are minimal requirements needed for biocuration that are listed below.
  • Mandatory requirements:
    a) System should indicate browser compatibility. At least be compatible with two of the following: Firefox, Chrome, Safari or Explorer.
    b) System should highlight entities and relationships (if applicable) relevant to the annotation task. We encourage color coding of entity types and links to relevant data sources when appropriate.
    c) User should be able to edit the text mining results by correcting errors or adding missing information, and should be able to export the corrected data.
    d) Use standard input and output formats; if possible support more than one format type. For Input: For biomedical literature: document ID (e.g. PubMed IDs, or PubMed central IDs) and/or document text (e.g. text, html, pdf or XML). For other types of input use adopted standards when possible, which should be consulted with user community. For Output: Export the results at least in tab-delimited and XML formats. We also strongly encourage teams to adopt the BioC format (XML-based) .

  • Other strongly desired requirements (Optional):
    a) Full-text processing at least the PubMed Central open access articles
    b) Interactive disambiguation of domain entities.
    c) Ability to filter/sort the results according to different criteria; rank results based on what is more relevant to the user.

  • Additional requirements (Optional):
    a) On/off for text mining tool, allowing manual annotation in off mode.
    b) Record time, be able to record time of curation session for each user (need log in as well)
    c) Load curation suggestions or warnings for display during curation
    d) Upload as input a gene list or ontology term list for focused curation.
    c) System Performance: An in-house evaluation of the system is requested to show that the system performs reasonably well prior to participation in the IAT track’s user testing. The teams should (i) provide a description of the dataset used in the benchmarking, as well as the actual people involved in the annotation of the dataset; (ii) report on the metrics used in the evaluation (e.g., precision, recall, F-measure, mean average precision (MAP), etc.); and (iii) indicate how the metrics were calculated (e.g., at a sentence level, at a document level, etc.).

Proposals will be chosen based on the relevance to the biocuration task and the reported performance of the text mining tool. Teams will be informed of acceptance by April 10, 2015.

In the IAT track activity, systems will be evaluated based on (i) performance (time-on-task and accuracy of the text mining-assisted task as compared to manual or some reference curation), and (ii) a subjective measure via a user questionnaire. See last survey as an example. The user survey will consist of six main topics: (1) overall reaction; (2) system’s ability to help complete tasks; (3) design of application; (4) learning to use the application; (5) usability of the system; and (6) recommendation of the system. These topics were selected based on those developed for the Questionnaire for User Interface Satisfaction (QUIS) developed by Chin et al. and shown to be a reliable guide to understanding user reactions [6]. There will be two levels of evaluation: one focused more on the overall system where the user will perform a real biocuration task so both the performance aspect and usability will be evaluated, the other focused on usability of the website by performing basic pre-defined tasks and reporting on success in achieving those tasks. In BioCreative V, we would like to propose conducting individual on-site usability tests with a small subset of users. This will allow the users to be observed and asked why they performed a given action.

User Recruitment
To make this activity successful, recruitment of users that are representative of the target community is essential. We expect that the teams are familiar with their user community and are able to recruit/commit two users to be involved throughout the developing and evaluation process. Committed users will help in the selection and annotation of dataset to be tested. The names of the users should be provided in the letter of intent. Additional users will be recruited for the evaluation phase by the organizers.

We hope that participation on this activity will expand the user adoption of text mining tools. We expect the participation in the IAT task to create new collaborations between text miners and users. The usability test will provide the participating teams with insight about their interfaces. The feedback can be utilized to improve the user-system experience. We expect that participating systems will increase the efficiency of curation, and subsequently be considered as a potential tool in the biocuration workflow.

Information for biocurators is available at

Track timeline

Note that timelines may be subject to change
  • January 31, 2015: Submission of letter of intent by teams. Send email to Cecilia Arighi, with Subject:BC5_intent
  • March 31, 2015: Submission of system document by teams. Send email to Cecilia Arighi, with Subject:BC5_system
  • April 10, 2015: Notification of accepted systems
  • April-May 2015: Developing individual systems and iterative system integration
  • Mid-June. 2015: Individual systems due date
  • June-July. 2015: Overall system evaluation
  • August 2015: Final paper deadline
  • September 2015: BioCreative V workshop in Spain
  • References

      1. Lu, Z. and Hirschman, L. (2012) Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II. Database, 17.
      2. Hirschman, L., Burns, G.A., Krallinger, M., Arighi, C., Cohen, K.B., Valencia, A., Wu, C.H., Chatr-Aryamontri, A., Dowell, K.G., Huala, E. et al. (2012) Text mining for the biocuration workflow. Database (Oxford) , 2012, bas020.
      3. Arighi, C., Roberts, P., Agarwal, S., Bhattacharya, S., Cesareni, G., Chatr-aryamontri, A., Clematide, S., Gaudet, P., Giglio, M., Harrow, I. et al. (2011) BioCreative III interactive task: an overview. BMC Bioinformatics, 12, S4.
      4. Arighi, C. and Ben Carterette, K.B.C., Martin Krallinger, W. John Wilbur, Petra Fey, Robert Dodson, Laurel Cooper, Ceri E. Van Slyke, Wasila Dahdul, Paula Mabee, Donghui Li, Bethany Harris, Marc Gillespie, Silvia Jimenez, Phoebe Roberts, Lisa Matthews, Kevin Becker, Harold Drabkin, Susan Bello, Luana Licata, Andrew Chatr-aryamontri, Mary L. Schaeffer, Julie Park, Melissa Haendel, Kimberly Van Auken, Yuling Li, Juancarlos Chan, Hans-Michael Muller, Hong M Cui, James P. Balhoff, Johnny Chi-Yang Wu, Zhiyong Lu, Chih-Hsuan Wei, Catalina O. Tudor, Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan, Juan Miguel Cejuela, Pratibha Dubey, Cathy Wu. (2012) An Overview of the BioCreative 2012 Workshop Track III: Interactive Text Mining Task. Database (Oxford) .
      5. Matis-Mitchell, S., Roberts, P., Tudor, C.O. and Arighi, C.N. (2013), Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, Bethesda, MD, Vol. 1, pp. 190-203.
      6. Chin, J.P., Diehl, V.A. and Norman, K.L. (1988) Development of an Instrument Measuring User Satisfaction of the Human-Computer Interface. Proceedings of ACM CHI Conference on Human Factors in Computing Systems, 213-218.