RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BC Workshop '12

Track III-Interactive TM [2011-09-19]

Track Coordinators

Cecilia Arighi
Martin Krallinger
Kevin Cohen
John Wilbur
Ben Carterette

Table of Contents

  • Invitation to TM teams
  • Invitation to Biocurators
  • Participating Systems
  • Biocurator's Tasks for pre-workshop evaluation
  • Biocurator's Checklist
  • Important dates
  • Downloads
  • 1- Invitation to Text Mining Teams

    We invite text mining teams for the submissions of system descriptions that highlight a text mining/NLP system with respect to a specific biocuration task. The description should be biocurator centric, providing clear examples of input (such as PMID, gene, keyword) and output (list of relevant articles, compound recognition, PPI, etc) of the system, and provide the context in which the system can be used effectively (e.g. the task is only applicable for articles about a given taxon group). The track is open to systems that process abstracts and/or full-length articles.

    The description should be no longer than 6 pages including figures, and Word or RTF formats are preferred.

    All submissions will be evaluated by the BioCreative Organizing Committee according to the following criteria:

    1. Relevance and Impact: Is the system currently being used in a biocuration task/workflow?
    2. Adaptability: Is it robust and adaptable to applications for other related biocuration tasks (i.e., can be utilized by multiple databases/resources)?
    3. Interactivity: Does it provide an interactive web interface for biocurator’s testing?
    4. Performance: Can the system be benchmarked and provide precision and recall for the task?

    For system evaluation, the participating teams should:

    1. Define a curation task according to the system capabilities
    2. Provide a set of annotated examples as a practice test for curators prior to the evaluation (30-50 examples). The idea is to provide some examples of correct system output for the task, so curators learn what to expect from the system.
    3. Suggest biocurator(s) who could annotate literature corpus as the gold standard for evaluation before the workshop (approximately 50 documents)
    4. Benchmark system and submit appropriate metrics (precision/recall/f-measure/MAP) for the given task(s) by March1, 2012. Note that here we request your own benchmarking, which may have already been published, to make sure that the systems have gone through some formal evaluation.

    The list of selected teams will be posted with accompanying description on or about January 20, 2012. The BioCreative Organizing Committee along with the teams will identify and recruit biocurators that will participate in the evaluation of the system.

    The evaluation will include comparing time-on-task in manual vs. system activities, as well as precision and recall for uncurated and curated set comparing to gold standard (generated by the suggested biocurator, and blind to the systems).

    To assist teams with this activity, a document with the description of a system and a proposed task is found at the end of this page under Downloads.

    At the workshop

    The selected systems will be presented on the second day of the workshop. A demo session, where the users (biocurators) will have the opportunity to use the systems, will follow.

    Finally, the results and observations from the system evaluation will be presented.


    Accepted descriptions will be published in the workshop proceedings. Submission of documents should be done via Easychair. Deadline March 15.



    Systems should be web based and compatible with Mozilla Firefox 4.0 or higher.

    2- Invitation to Biocurators

    We invite biocurators to participate in a user study on the text mining system of their choice prior to and at the workshop. The user study will involve (i) manual curation and text-mining of the literature corpus by the biocurators; (ii) recording of the user interactions with the system (with logs of all queries and clicks); and (iii) a post-study survey.

    3- Participating systems

    TextPressoCuration of subcellular localization using Gene ontology cellular componentFull-textCecilia Arighi
    PCSCuration of Entity-Quality terms from phylogenetic literature using ontologiesN/ACecilia Arighi
    TagtogProtein/gene mentions recognition via interactive learning and annotation frameworkAbstractJohn Wilbur
    PubTatorDocument triage (relevant documents for curation) and bioconcept annotation (gene, disease, chemicals)AbstractKevin Cohen
    PPIinterfinderMining of protein-protein interaction for human proteins (abstract and full legth articles):document classification and extraction of interacting proteins and keywords.AbstractMartin Krallinger
    eFIPMining Protein Interactions of Phosphorylated Proteins from the Literature. Document classification and information extraction of phosphorylated protein, protein binding partners and impact keyword AbstractMartin Krallinger
    AcetylationDocument retrieval and ranking based on relevance on protein acetylationAbstractCecilia Arighi
    T-HODDocument triage for disease-related genes (relevant documents for curation) and bioconcept annotation (gene, disease and relation)AbstractCecilia Arighi

    More details about the systems can be found here.

    4-Biocurator's Tasks for Pre-workshop evaluation

    Prior to the workshop each biocurator will need to perform the following tasks:

  • Install curator logger to track time-on-task and curator web-based activities

    Biocurators assigned for testing the system should install a client-side web-browser add-on curatorlogger.xpi (download and instruction available in Downloads at the end of this page) that will allow tracking time and user activity during testing. Users will be informed as to the nature of the data being collected and asked whether they want to opt out of data collection when the browser opens. The data that is collected will be sent to one of the organizers (Ben Carterette) automatically when a session is complete.

  • Get Trained: use examples provided by the teams to familiarized with the assigned system. Also make sure you get information about the curation guidelines for the particular task

  • Perform Evaluation: curate a set of documents manually (approx. 25) and a set of documents using the selected system (approx. 25).

    Manual task: the user will be given the list of documents in Pubmed environment for their manual processing and results should be stored in a spreadsheet with format provided by team.

    Using system: curator will validate the output provided by the system and store information using the system or a spreadsheet (output to be determined by each system).

  • Complete Survey: After the evaluation of the systems users should complete the survey in which users will be asked additional questions that may help elucidate their experience with the system.
    To complete Survey please click here

    5- Biocurator's Checklist

      1-Get documentation and link to tool from Coordinator or Team
      2-Install curatorlogger (download available at the end of this page)
      3-Use Mozilla Firefox 4.0 or higher for the activity
      4-Remember to select play button when starting curation session, conversely select stop when you are done
      5-Even when logger is used, time your activity independently
      6-Practice on system with examples provided by Teams
      7-Record results of manual curation in requested format (should be provided by Coordinator or Team)
      8-Record results of system curation in requested format (should be provided by Coordinator or Team)
      9-Complete survey

    6- Important Dates

    Item Deadline Submit via Comment
    Team Registration November 15, 2011 web Closed
    Text Mining System Description December 31, 2011 email to Subject:BioCreative-2012 Track III
    System Benchmarking Results March 5, 2012 email to Subject:BioCreative-2012 Track III-result
    Submission of Practice Test March 5, 2012 email to Coordinator Subject:BioCreative-2012 Track III-test
    Interface Available for Testing March 5, 2012 email to Coordinator Subject:BioCreative-2012 Track III-interface
    Biocurator's Evaluation Results March 20-25, 2012 email to Coordinator Subject:BioCreative-2012 Track III-biocurator
    Workshop Noon April 4- 5pm April 5, 2012 Reporting from testing Track III will be presented

    All questions pertaining to this track should be directed to Cecilia Arighi (

    Return to Homepage

  • Downloads