RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative IV

Track 5- User Interactive Task (IAT) [2012-11-15]

Organizers

    Cecilia Arighi, Sherri Matis, Phoebe Roberts, Catalina Oana Tudor and Cathy Wu

Motivation & Background

Biomedical relationships curated from the literature are used singly and en masse for general information and computational analyses. Curating relationships from the literature usually involves identifying relevant incoming literature, retrieving relevant articles, and extracting information that will translate into machine-readable annotations. Sub-disciplines of text mining can address each of these steps. BioCreative (Critical Assessment of Information Extraction in Biology, http://www.biocreative.org/) [6-9] promotes the development of text mining and text processing tools to support the activities of communities of researchers and biocurators. The Interactive Task (IAT), introduced in BioCreative III [10] and continued at the BioCreative 2012 Workshop [11], has increasingly shown that active involvement of a representative group of end users is critical for guiding development and evaluation of useful tools and standards. For this task, teams are invited to demonstrate a system that can assist in a biocuration task and biocurators are recruited to test the systems by curating a selected dataset (manually or using the system). In addition, biocurators provide feedback about the utility and usability of the systems by completing a user survey. This task has provided an excellent opportunity to observe the approaches, standards and functionalities used by state-of-the-art text mining systems with potential application in the biocuration domain.

Task Proposal

In BioCreative IV we will continue with the interactive task tradition by inviting text mining teams to participate in a formal evaluation of their text mining/NLP systems with respect to a specific biocuration task. The track is open to any biocuration task; examples of previous biocuration tasks can be found in here. We will ask interested teams to submit an email expressing their interest in participating on this task by January 15, 2013. Participating teams should submit a document by June 7, 2013 where they describe the system and the biocuration task(s), propose 2 biocurators, and address the following aspects:

    1-Relevance and Impact: The teams should clearly describe: i) the targeted user community, and should be familiar and compliant with the needs and standards used by them; ii) the task proposed and how it aligns with the guidelines set by the targeted user community, and iii) use cases for the application.
    2-User Interactivity: We are asking for web-based text mining systems with user interactivity (such as, highlighting, editing, and exporting results). Based on discussions with the User Advisory Group, we have come up with a list of requirements that are categorized as mandatory, strongly desired and additional. We prepared a document (track5_IAT_Requirements, PDF format) with all requirements along with examples to clarify the intent of certain functionalities. Download this file in download section at the end of this page.
    Mandatory requirements:
    a) System should indicate browser compatibility. At least be compatible with two of the following: Firefox, Chrome, Safari or Explorer.
    b) System should highlight entities and relationships (if applicable) relevant to the annotation task. We encourage color coding of entity types and links to relevant data sources when appropriate.
    c) User should be able to edit the text mining results by correcting errors or adding missing information, and should be able to export the corrected data.
    d) Use standard input and output formats; if possible support more than one format type. For Input: For biomedical literature: document ID (e.g. PubMed IDs, or PubMed central IDs) and/or document text (e.g. text, html, pdf or XML). For other types of input use adopted standards when possible, which should be consulted with user community. For Output: Export the results at least in tab-delimited and XML formats. We also encourage teams to adopt the BioC format (XML-based) described in the interoperability Track I.
    Other strongly desired requirements (Optional):
    a) Full-text processing at least the PubMed Central open access articles
    b) Interactive disambiguation of domain entities.
    c) Ability to filter/sort the results according to different criteria; rank results based on what is more relevant to the user.
    Additional requirements (Optional):
    a) On/off for text mining tool, allowing manual annotation in off mode.
    b) Record time, be able to record time of curation session for each user (need log in as well)
    c) Load curation suggestions or warnings for display during curation
    d) Upload as input a gene list or ontology term list for focused curation.
    3-System Performance: The system should have undergone some internal benchmarking and teams should (i) provide the description of the dataset, including source (who annotated it) and size; (ii) report on metrics: precision, recall and F-measure, and/or mean average precision (MAP), and (iii) indicate the level at which the metrics were calculated (sentence vs. document) to correspond to the level to be tested in the user evaluation.

Evaluation Systems will be evaluated based on (i) performance (time-on-task and accuracy of the text mining-assisted task as compared to manual or some reference curation), and (ii) a subjective measure via a user questionnaire. See example: http://ir.cis.udel.edu/biocreative/survey.html

User Recruitment To make this activity successful, recruitment of users that are representative of the target community is essential. We expect that the teams are familiar with their user community and are able to recruit/commit two users to be involved throughout the developing and evaluation process. Committed users will help in the selection and annotation of dataset to be tested. The names of the users should be provided in the letter of intent. Additional users will be recruited for the evaluation phase by the organizers.

We hope that participation on this activity will expand the user adoption of text mining tools. We will provide more details about timeline and the track in a follow up correspondence.

References

1. Hirschman L, Yeh A, Blaschke C, Valencia A: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 2005, 6 Suppl 1:S1.

2. Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A: Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biol 2008, 9 Suppl 2:S1.

3. Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A: An Overview of BioCreative II.5. IEEE/ACM Trans Comput Biol Bioinform 2010, 7(3):385-399.

4. Arighi CN, Lu Z, Krallinger M, Cohen KB, Wilbur J, Valencia A, Hirschman L, Wu CH: Overview of the BioCreative III Workshop BMC Bioinformatics 2011, 12 Suppl. 8:S1.

5. Wu CH, Arighi CN, Cohen KB, Hirschman L, Krallinger Martin, Lu Z, Mattingly C, Valencia A, Wiegers TC, Wilbur WJ: Editorial: BioCreative-2012 Virtual Issue. Database (Oxford) 2012 (in press).

6. Hirschman L, Yeh A, Blaschke C, Valencia A: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 2005, 6(Suppl 1):S1.

7. Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A: Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biology 2008, 9(Suppl 2):S1.

8. Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A: An Overview of BioCreative II.5. IEEE/ACM Trans Comput Biol Bioinform 2010, 7(3):385-399.

9. Arighi C, Lu Z, Krallinger M, Cohen K, Wilbur W, Valencia A, Hirschman L, Wu C: Overview of the BioCreative III Workshop. BMC Bioinformatics 2011, 12(Suppl 8):S1.

10. Arighi C, Roberts P, Agarwal S, Bhattacharya S, Cesareni G, Chatr-aryamontri A, Clematide S, Gaudet P, Giglio M, Harrow I et al: BioCreative III interactive task: an overview. BMC Bioinformatics 2011, 12(Suppl 8):S4.

11. Arighi C, Ben Carterette KBC, Martin Krallinger, W. John Wilbur, Petra Fey, Robert Dodson, Laurel Cooper, Ceri E. Van Slyke, Wasila Dahdul, Paula Mabee, Donghui Li, Bethany Harris, Marc Gillespie, Silvia Jimenez, Phoebe Roberts, Lisa Matthews, Kevin Becker, Harold Drabkin, Susan Bello, Luana Licata, Andrew Chatr-aryamontri, Mary L. Schaeffer, Julie Park, Melissa Haendel, Kimberly Van Auken, Yuling Li, Juancarlos Chan, Hans-Michael Muller, Hong M Cui, James P. Balhoff, Johnny Chi-Yang Wu, Zhiyong Lu, Chih-Hsuan Wei, Catalina O. Tudor, Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan, Juan Miguel Cejuela, Pratibha Dubey, Cathy Wu: An Overview of the BioCreative 2012 Workshop Track III: Interactive Text Mining Task. Database (Oxford) 2012 (in press).

Downloads